Meta Reportedly Uses Thousands Of Prohibited Books To Train AI Models, Even Though They Have Been Given Legal Warning
JAKARTA - Meta Platforms, parent company Facebook and Instagram, reportedly used thousands of banned books to train their artificial intelligence (AI) models. This was done, despite being given legal warnings regarding copyright issues. This report appears in a copyright infringement lawsuit filed by comedian Sarah Silverman and a number of other well-known writers.
The plaintiff stated that Meta used their work without permission to train an AI language model known as Llama. A recent document, which combines two previously filed lawsuits, reveals that Meta's lawyers had previously warned the company of legal risks that may arise from the use of thousands of pirated books to train AI models.
The document also includes conversations on Discord servers between researchers affiliated with Meta, the Dettmers Team, which discusses the dataset procurement. The conversation noted discussions with Meta's legal department regarding the legality of using book files as training data.
SEE ALSO:
One researcher stated that on Facebook many people are interested in working with certain datasets, but cannot use them for legal reasons. In the conversation, the researchers also called "books with active copyright" a source of potential problems and claimed that training on the data should be "included in reasonable use," a US law doctrine that protects certain use of copyrighted works without a license.
The lawsuit comes amid a series of lawsuits against tech companies by content creators accusing them of using copyrighted work without permission to build a generative AI model. The success of these cases could change the generative AI trend by encouraging AI companies to compensate artists, writers, and other content creators for the use of their works.
Until now, Meta has not responded to these allegations.