Creative Writers' Attitudes on Writing as Training Data for Large Language Models
Katy Ilonka Gero, Meera Desai, Carly Schnitzler, Nayun Eom, Jack Cushman, Elena L. Glassman
TL;DR
The paper investigates how creative writers view the use of their writing as training data for large language models, addressing a pressing ethical and policy question. Using grounded theory through 33 interviews across genres and publication contexts, it identifies three core principles—The Creative Chain, Respect, and The Human Element—that shape writers’ reasoning, along with three realistic expectations—Lack of Control, Industry Impacts, and Interpretation of Scale—that often conflict with the current dynamics of AI development. It argues for a data-colonialism lens and library-inspired rethinking of LLMs, suggesting writer-led governance, transparency, and feasible opt-out or compensation mechanisms as practical directions. The findings aim to inform policy, licensing, and technical design to align AI development with human-centered values and equitable data practices in the literary domain.
Abstract
The use of creative writing as training data for large language models (LLMs) is highly contentious and many writers have expressed outrage at the use of their work without consent or compensation. In this paper, we seek to understand how creative writers reason about the real or hypothetical use of their writing as training data. We interviewed 33 writers with variation across genre, method of publishing, degree of professionalization, and attitudes toward and engagement with LLMs. We report on core principles that writers express (support of the creative chain, respect for writers and writing, and the human element of creativity) and how these principles can be at odds with their realistic expectations of the world (a lack of control, industry-scale impacts, and interpretation of scale). Collectively these findings demonstrate that writers have a nuanced understanding of LLMs and are more concerned with power imbalances than the technology itself.
