• FarceOfWill@infosec.pub
    link
    fedilink
    English
    arrow-up
    0
    ·
    23 hours ago

    Because once you can generate the GPL code from the lossy ai database trained on it the GPL protection is meaningless.

    • Grimy@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      18 hours ago

      In such a scenario, it will be worth it. Llm aren’t databases that just hold copy pasted information. If we get to a point where it can spit out whole functional githubs replicating complex software, it will be able to do so with most software regardless of being trained on similar data or not.

      All software will be a prompt away including the closed sourced ones. I don’t think you can get more open source then that. But that’s only if strident laws aren’t put in place to ban open source ai models, since Google will put that one prompt behind a paychecks worth of money if they can.

      • FarceOfWill@infosec.pub
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 hours ago

        I don’t see how you can write the law such that it allows training ai on copyrighted data without making it possible to train a special llm on a single github instead of the entire universe, and essentially treat it as a full compression of the source.

        • Grimy@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          39 minutes ago

          The outputs are still bound to copyright laws. Tracing pixel per pixel over an artwork doesn’t make it immune to copyright laws, maliciously over training gen ai to act like a database and outright copy shouldn’t either.

          If you have a carbon copy of someone’s github, it doesn’t matter if you generated it, it’s still a copy. Although code is a difficult example since I’m not entirely where the line is for one repo to be different then the other when they are accomplishing the same task.

          I always imagined businesses just grabbed the gpl software and would tell their employees to rewrite it but different. Most things I dive down into seem to stem from one algorithm or two from a paper and the rest is fluff.