CA District Court Rules That Copyrighted Content Used for Training AI is ‘Fair Use’

The U.S. District Court for the Northern District of California, in an order issued on June 23, has found that using copyrighted content to train generative AI is transformative and therefore qualifies as “fair use.” (Bartz et al. v. Anthropic PBC (3:24-cv-05417-WHA))

Artificially intelligent large language models (LLMs) must have access to vast quantities of data in order to train themselves to respond to users’ prompts appropriately. These models often either scrape the web for or are fed content. Some of that content often consists of copyrighted works, such as books, news articles, or photographs.

The defendant in the case, Anthropic, describes itself as “an AI safety and research company” that builds “reliable, interpretable, and steerable AI systems.” The company was founded in 2021 and is best known for its family of LLMs under the name “Claude.”

At its inception, Anthropic embarked on an ambitious project to copy “all the books in the world” for two purposes: 1) to create a digital library, but as the co-founder put it, without all the “legal/practice/business slog” and 2) to train its Claude AI models.

Anthropic proceeded to download millions of digitized, copyrighted books from illegal pirate websites to create a digital, central library. The company also legally purchased physical copies of copyrighted books, broke them apart and manually scanned the pages, turning them into digitized files for their library. A subset of these files was then used to help train Claude and improve its performance for users.

Plaintiffs included Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, all authors of copyrighted books, both fiction and non-fiction, that Anthropic had copied. The authors proceeded to file a class action in August 2024 claiming that Anthropic had infringed on their copyrighted works.

Anthropic moved for summary judgment on the question of “fair use.” Fair use is a concept under Section 107 of the Copyright Act that allows for the unlicensed use of copyrighted works “for purposes such as criticism, comment, news reporting, teaching…scholarship, or research” (17 U.S. Code § 107). The statute considers four main factors in fair use cases, with the first being:

Purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes.

Since the 1980s, courts have found that so-called “transformative” uses are more likely to be considered fair. Transformative uses are those that add something new, with a further purpose or different character, and do not substitute for the original work.

In gauging the transformative nature of Anthropic’s use of copyrighted works, the Court came to the following conclusions, separating the legal issues concerning the larger central library of digitized books and the subset of books used to train the AI.

Copies Used to Train Claude

The Court found that the use of the copies of downloaded and digitized books to train Claude—which did not result in any infringing output—was “spectacularly” transformative and therefore was fair use. The decision stated that “users interacted only with the Claude service, which placed additional software between the user and the underlying LLM to ensure that no infringing output ever reached the users.” The Court compared this case to the 2015 case involving Google Books, where Google digitized a large, searchable database of copyrighted books but limited search results to only “snippets” of the actual copyrighted material, thus avoiding infringement claims. (Authors Guild v. Google, Inc., No. 13-4829 (2d Cir. 2015)).

Copies Used to Create a General Purpose Library

The digitization of the books purchased in print was fair use because it merely constituted Anthropic replacing print copies it had already purchased with more conveniently formatted copies. The Court reiterated that digitization of a legally obtained copy of a work for the purpose of saving storage space and enhancing searchability is transformative fair use. Importantly, Anthropic destroyed the original books it purchased, so the original books were fully replaced by the digital copy. Also, there was no evidence that the library was ever made available to anyone outside the company.

Retaining the downloaded pirated books for Anthropic’s permanent, general purpose library was not fair use and constituted infringement. The Court reasoned that “pirating copies [of books] to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own use — and not a transformative one.”

The parties have been ordered to perform discovery on the pirated copies for an impending trial; however, it is likely that the case will settle. Furthermore, this order is highly fact-specific; there is no guarantee that another set of facts involving the unauthorized use of copyrighted content will be decided the same.

If you have any questions regarding artificial intelligence, copyright law, and fair use, please contact Tara Aaron-Stelluto.