This presentation gives an overview of interactions between United States (U.S.) copyright law and artificial intelligence (AI). Current generative AI is different from past big data, in that current AI models train on full text long form written works, while in the past, datasets and structured information had the highest value. Also, in recent years, there has been extensive economic activity around AI, which newly surfaced business oriented issues. As a result, new issues have arisen regarding copyright law and AI.
Throughout 2023 and 2024, the U.S. Copyright Office held listening sessions around specific topics related to generative AI and copyright. The Copyright Office released guidance about digital replicas (ie. deep fakes) with guidance yet to come on additional topics including copyrightability of works incorporating AI-generated material, training AI models on copyrighted works, licensing considerations, and liability issues. This presentation gives a just-the-facts summary of U.S. Copyright Office activities, and of emergent case law from lawsuits related to AI and copyright.
A current parallel regulatory thrust is comprehensive U.S. federal regulation of AI ethics. The National Artificial Intelligence Initiative Act of 2020 provided funding for a 5 year roll out of AI regulation. With AI ethics regulation in rapid development in the U.S., and legally binding outcomes pending in the near future, large corporations building AI tools have a strong incentive to control the conversation and define AI ethics. By emphasizing copyright, corporations might seek to shift emphasis away from other ethical issues, such as the impacts algorithmic decision making has on people’s lives, increased surveillance, and other ethical issues. For example, it may be the case that ethics discussion is steered towards copyright law and “ethical AI” co-opted to refer to training AI models on licensed content. This presentation considers ethics more broadly, and invites participants to consider how increased focus on copyright and ethics might distract from other ethical issues.
Licensing dovetails with copyright, in that contractual obligations or contractual rights can shift what is allowable and can limit fair use or can expand what is allowed. Generative AI is largely controlled by a handful of very large corporations. High quality training data, such as scholarly articles and other high quality material written by people, tends to be controlled by not-quite-as-large corporations. For example, Google’s market cap is 2.3 trillion U.S. dollars, more than 600 times Clarivate’s market cap of 3.6 billion U.S. dollars. In order to maintain control of assets, in light of a potentially lucrative new use, academic database providers might tend to contractually limit established fair uses like text mining. This presentation brushes on text mining as fair use, and on trends in licensing restrictions.
This presentation overviews recent trends in U.S. copyright law and AI, with emphasis on developing federal regulations and guidance, AI ethics, and the legal right of scholars to do text as impacted by changing licensing practices.