The Information : Anthropic’s Use of Books as Training Data Is Fair Use, Says Co

Anthropic’s Use of Books as Training Data Is Fair Use, Says Court

Anthropic did not violate copyright law by training its large language models on copies of books because the training was protected under “fair use,” a federal district court in San Francisco ruled Tuesday.

Authors Andrea Bartz, Charles Graeber and Kirk Johnson sued Anthropic last year, alleging that the AI company violated copyright laws by purchasing and downloading books, including their own, and using them to train its Claude models.

The court ruled that Anthropic’s use of these books in training was “exceedingly transformative,” so it qualified as fair use. “To make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable,” the court wrote.

However, it found that Anthropic—in addition to buying millions of print books—also “pirated” over 7 million books, including from popular sources such as Library Genesis and Pirate Library Mirror, when it downloaded the books and stored them, which was not protected by fair use. Anthropic has argued that it only downloaded the books to develop its AI, which is fair use. This claim is set to go to trial.