Office space meme:
“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”
Office space meme:
“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”
Or more realistically: a description of how you could source the data.
Correct. Llama isn’t open source, either.
Not at all. It’s like claiming an emulator is open source, because it has a plugin system, but you need a closed source build dependency that the developer doesn’t disclose to the puplic.
Source build dependency… so you don’t have a problem with the LLM at all! You have a problem with the data collection process or the pre-training! So an emulator can’t be open source if the methodology on how the developers discovered how to read Nintendo ROM’s was not disclosed? Or which games were dissected in order to reverse engineer that info? I don’t consider that a prerequisite to say an emulator is open
So if i say… remove the data set from deepseek what remains would be considered open source by you?
No. The emulator is open source if it supplies the way on hou to get the binary in the end. I don’t know how else to explain it to you: No LLM is open source.
So i still don’t see your issue with deepseek, because just like an emulator, everything is open source, with the exception of the data. The end result is dependent on the ROM put in to it, you can always make your own ROM, if you had the tools, and the end result followed the expected format. And if the ROM was removed the emulator is still the emulator.
So if deep seek removed its data set, would you then consider deepseek open source?
If I distribute a set consisting of emulator and a Rom of a closed source game (without the sourcecode), then the full set is not open source.
Kind of, but that’s like expecting a console without any firmware. The Weights are the important bit of an LLM distribution.
So like an emulator. Or at least the PS2 ones when you had to dump your bios from your machine (or snatch someone else’s).
But that’s my point! The data set is interchangeable. So Its not what makes the deepseek, THE deepseek LLM . But without the data set it would be functionally useless. And there would be no way possible to satisfy your requirement for data set openness. You said there is some line in the sand somewhere where you might be satisfied with some amount of the data, but your argument states that granularity must be absolute in order to justify calling it open source. You demand an impossible unnecessary standard that is not held to other open source projects.
Just to add, a good chunk of newer emulators require you to get a dump of the firmware externally, not just the ps2. Pretty much anything from ps2 onwards is like that.
The engine is open source, the model is not.
The enumqtor is open source, the games it can run are not.
I don’t see how it’s so hard to understand.
They are saying that the model that the engine is running is open source because they released the model. That’s like saying that a game is open source because I released an emulator and the exscutable file. It’s just not true.