US judge backs using copyrighted books to train AI

US judge backs using copyrighted books to train AI / Photo: FABRICE COFFRINI - AFP

A US federal judge has sided with Anthropic regarding training its artificial intelligence models on copyrighted books without authors' permission, a decision with the potential to set a major legal precedent in AI deployment.

Text size:

District Court Judge William Alsup ruled on Monday that the company's training of its Claude AI models with books bought or pirated was allowed under the "fair use" doctrine in the US Copyright Act.

"Use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use," Alsup wrote in his decision.

"The technology at issue was among the most transformative many of us will see in our lifetimes," Alsup added in his 32-page decision, comparing AI training to how humans learn by reading books.

Tremendous amounts of data are needed to train large language models powering generative AI.

Musicians, book authors, visual artists and news publications have sued various AI companies that used their data without permission or payment.

AI companies generally defend their practices by claiming fair use, arguing that training AI on large datasets fundamentally transforms the original content and is necessary for innovation.

"We are pleased that the court recognized that using 'works to train LLMs was transformative,'" an Anthropic spokesperson said in response to an AFP query.

The judge's decision is "consistent with copyright's purpose in enabling creativity and fostering scientific progress," the spokesperson added.

- Blanket protection rejected -

The ruling stems from a class-action lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of illegally copying their books to train Claude, the company's AI chatbot that rivals ChatGPT.

However, Alsup rejected Anthropic's bid for blanket protection, ruling that the company's practice of downloading millions of pirated books to build a permanent digital library was not justified by fair use protections.

Along with downloading of books from websites offering pirated works, Anthropic bought copyrighted books, scanned the pages and stored them in digital format, according to court documents.

Anthropic's aim was to amass a library of "all the books in the world" for training AI models on content as deemed fit, the judge said in his ruling.

While training AI models on the pirated content posed no legal violation, downloading pirated copies to build a general-purpose library constituted copyright infringement, regardless of eventual training use.

The case will now proceed to trial on damages related to the pirated library copies, with potential penalties including financial damages.

Anthropic said it disagreed with going to trial on this part of the decision and was evaluating its legal options.

Valued at $61.5 billion and heavily backed by Amazon, Anthropic was founded in 2021 by former OpenAI executives.

The company, known for its Claude chatbot and AI models, positions itself as focused on AI safety and responsible development.

S.Sosa--ECdLR

El Comercio De La República - US judge backs using copyrighted books to train AI

US judge backs using copyrighted books to train AI

Featured

At least 10 dead in Colombia landslide

US judge allows using pirated books to train AI

French volunteers hand migrants water beyond the crowded beach

France ordered to compensate family of jogger killed by toxic algae