Office space meme:

“If y’all could stop calling an LLM “open source” just because they published the weights… that would be great.”

  • Prunebutt@slrpnk.netOP
    link
    fedilink
    arrow-up
    7
    ·
    10 days ago

    everything is open source, with the exception of the data

    If I distribute a set consisting of emulator and a Rom of a closed source game (without the sourcecode), then the full set is not open source.

    So if deep seek removed its data set, would you then consider deepseek open source?

    Kind of, but that’s like expecting a console without any firmware. The Weights are the important bit of an LLM distribution.

    • WraithGear@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      2
      ·
      10 days ago

      So like an emulator. Or at least the PS2 ones when you had to dump your bios from your machine (or snatch someone else’s).

      But that’s my point! The data set is interchangeable. So Its not what makes the deepseek, THE deepseek LLM . But without the data set it would be functionally useless. And there would be no way possible to satisfy your requirement for data set openness. You said there is some line in the sand somewhere where you might be satisfied with some amount of the data, but your argument states that granularity must be absolute in order to justify calling it open source. You demand an impossible unnecessary standard that is not held to other open source projects.

      • mamotromico@lemmy.ml
        link
        fedilink
        arrow-up
        6
        ·
        10 days ago

        Just to add, a good chunk of newer emulators require you to get a dump of the firmware externally, not just the ps2. Pretty much anything from ps2 onwards is like that.