Plaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs' protected expression. Drawing on adversarial ML and copyright law, we show that these polarized positions dramatically oversimplify the relationship between memorization and copyright. To do so, we leverage a recent probabilistic extraction technique to extract pieces of the Books3 dataset from 13 open-weight ...| arXiv.org
New research could have big implications for copyright lawsuits against generative AI.| www.understandingai.org
The New York Times might win its copyright lawsuit against OpenAI.| www.understandingai.org