The U.S. District Court for the Northern District of California, in an order issued on June 23, has found that using copyrighted content to train generative AI is transformative and therefore qualifies as “fair use.” (Bartz et al. v. Anthropic PBC (3:24-cv-05417-WHA))
Artificially intelligent large language models (LLMs) must have access to vast quantities of data in order to train themselves to respond to users’ prompts appropriately. These models often either scrape the web for or are fed content. Some of that content often consists of copyrighted works, such as books, news articles, or photographs.
The defendant in the case, Anthropic, describes itself as “an AI safety and research company” that builds “reliable, interpretable, and steerable AI systems.” The company was founded in 2021 and is best known for its family of LLMs under the name “Claude.”
At its inception, Anthropic embarked on an ambitious project to copy “all the books in the world” for two purposes: 1) to create a digital library, but as the co-founder put it, without all the “legal/practice/business slog” and 2) to train its Claude AI models.
Anthropic proceeded to download millions of digitized, copyrighted books from illegal pirate websites to create a digital, central library. The company also legally purchased physical copies of copyrighted books, broke them apart and manually scanned the pages, turning them into digitized files for their library. A subset of these files was then used to help train Claude and improve its performance for users.
Plaintiffs included Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, all authors of copyrighted books, both fiction and non-fiction, that Anthropic had copied. The authors proceeded to file a class action in August 2024 claiming that Anthropic had infringed on their copyrighted works.
Anthropic moved for summary judgment on the question of “fair use.” Fair use is a concept under Section 107 of the Copyright Act that allows for the unlicensed use of copyrighted works “for purposes such as criticism, comment, news reporting, teaching…scholarship, or research” (17 U.S. Code § 107). The statute considers four main factors in fair use cases, with the first being:
Purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes.
Since the 1980s, courts have found that so-called “transformative” uses are more likely to be considered fair. Transformative uses are those that add something new, with a further purpose or different character, and do not substitute for the original work.
In gauging the transformative nature of Anthropic’s use of copyrighted works, the Court came to the following conclusions, separating the legal issues concerning the larger central library of digitized books and the subset of books used to train the AI.
Copies Used to Train Claude
Copies Used to Create a General Purpose Library
The parties have been ordered to perform discovery on the pirated copies for an impending trial; however, it is likely that the case will settle. Furthermore, this order is highly fact-specific; there is no guarantee that another set of facts involving the unauthorized use of copyrighted content will be decided the same.
If you have any questions regarding artificial intelligence, copyright law, and fair use, please contact Tara Aaron-Stelluto.