Sunday Splits
Serving You Circuit Splits Every Sunday
Sai Mamidala | Is the Use of Copyrighted Works to Train AI Models Protected Under Fair Use?
Major AI companies around the world train their models on copyrighted material. The reason is simple: an AI model is only as good as the data used to train it, and building a useful model requires an enormous quantity of that data – far more than any company can feasibly produce on its own. So, companies like Anthropic, Google, and Meta turned to what was already available: books, articles, images, music, and code created by other people. While the training process involves models ingesting entire original works, these aren’t retained or even retrievable by the model. Rather, the process extracts statistical patterns about language, like relational vocabulary, syntax, and structure, and compresses them into numerical parameters to power future outputs. That mechanical reality is what makes the transformative fair use question surrounding AI so interesting, and the federal courts that have tried to answer this question have landed in strikingly different places. Because AI’s recent, explosive growth has outpaced the litigation surrounding it, we have an emerging split headed for circuit-level resolution rather than a mature circuit split.