[HN] Anti-Piracy Group Takes AI Training Dataset 'Books3′ Offline

irradiated@radiation.party · 1 year ago

[HN] Anti-Piracy Group Takes AI Training Dataset 'Books3′ Offline

AutoTL;DR@lemmings.world · 1 year ago

This is the best summary I could come up with:

The two simply do not mix, and the fumes rising from the surface just need a spark to set the entire concept of intellectual property rights alight.

As first reported by TorrentFreak, the large pirate repository The Eye took down the Books3 dataset after the Danish anti-piracy group Rights Alliance sent the site a DMCA takedown.

His stated goal at the time was to open up AI development beyond companies like OpenAI, which trained its earlier large language models on the still-unknown “Books1” and “Books2” repositories.

Comedian Sarah Silverman was just one of several authors who signed on to a class action lawsuit against Meta, claiming the company stole their books in order to train their LlaMA AI.

Growing AI models requires an enormous amount of information, and for close to a decade the technology’s development has depended on using protected text.

Big tech companies are increasingly uninterested in sharing this data, knowing the more they do, the more other people can build similar AI models, or tangle them up in lawsuits.

The original article contains 1,516 words, the summary contains 172 words. Saved 89%. I’m a bot and I’m open source!

[HN] Anti-Piracy Group Takes AI Training Dataset 'Books3′ Offline

[HN] Anti-Piracy Group Takes AI Training Dataset 'Books3′ Offline

Anti-Piracy Group Takes Massive AI Training Dataset 'Books3′ Offline