Frankentext: Stitching random text fragments into long-form narratives
Chau Minh Pham, Jenna Russell, Dzung Pham, Mohit Iyyer
TL;DR
Frankentexts address long-form narrative generation under a strict verbatim-copy constraint by leveraging an LLM as a text-composer over a massive pool of human-written snippets. The approach uses a three-stage pipeline—draft generation with a fixed copy rate, minimal editing, and optional agent-assisted retrieval via MCP—to implicitly navigate a combinatorial space of snippet arrangements. Across multiple models and dataset configurations, Frankentexts achieve higher writing quality and diversity than vanilla baselines while often evading automated detectors, though they incur high computational cost and raise authorship and copyright questions. The work also provides detailed metrics, datasets, and token-level annotations to aid future mixed-authorship detection and provenance research, positioning Frankentexts as a challenging but informative test bed for perception, detection, and policy discussions around AI-assisted writing.
Abstract
We introduce Frankentexts, a long-form narrative generation paradigm that treats an LLM as a composer of existing texts rather than as an author. Given a writing prompt and thousands of randomly sampled human-written snippets, the model is asked to produce a narrative under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from the provided paragraphs. This task is effectively intractable for humans: selecting and ordering snippets yields a combinatorial search space that an LLM implicitly explores, before minimally editing and stitching together selected fragments into a coherent long-form story. Despite the extreme challenge of the task, we observe through extensive automatic and human evaluation that Frankentexts significantly improve over vanilla LLM generations in terms of writing quality, diversity, and originality while remaining coherent and relevant to the prompt. Furthermore, Frankentexts pose a fundamental challenge to detectors of AI-generated text: 72% of Frankentexts produced by our best Gemini 2.5 Pro configuration are misclassified as human-written by Pangram, a state-of-the-art detector. Human annotators praise Frankentexts for their inventive premises, vivid descriptions, and dry humor; on the other hand, they identify issues with abrupt tonal shifts and uneven grammar across segments, particularly in longer pieces. The emergence of high-quality Frankentexts raises serious questions about authorship and copyright: when humans provide the raw materials and LLMs orchestrate them into new narratives, who truly owns the result?
