Table of Contents
Fetching ...

Frankentext: Stitching random text fragments into long-form narratives

Chau Minh Pham, Jenna Russell, Dzung Pham, Mohit Iyyer

TL;DR

Frankentexts address long-form narrative generation under a strict verbatim-copy constraint by leveraging an LLM as a text-composer over a massive pool of human-written snippets. The approach uses a three-stage pipeline—draft generation with a fixed copy rate, minimal editing, and optional agent-assisted retrieval via MCP—to implicitly navigate a combinatorial space of snippet arrangements. Across multiple models and dataset configurations, Frankentexts achieve higher writing quality and diversity than vanilla baselines while often evading automated detectors, though they incur high computational cost and raise authorship and copyright questions. The work also provides detailed metrics, datasets, and token-level annotations to aid future mixed-authorship detection and provenance research, positioning Frankentexts as a challenging but informative test bed for perception, detection, and policy discussions around AI-assisted writing.

Abstract

We introduce Frankentexts, a long-form narrative generation paradigm that treats an LLM as a composer of existing texts rather than as an author. Given a writing prompt and thousands of randomly sampled human-written snippets, the model is asked to produce a narrative under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from the provided paragraphs. This task is effectively intractable for humans: selecting and ordering snippets yields a combinatorial search space that an LLM implicitly explores, before minimally editing and stitching together selected fragments into a coherent long-form story. Despite the extreme challenge of the task, we observe through extensive automatic and human evaluation that Frankentexts significantly improve over vanilla LLM generations in terms of writing quality, diversity, and originality while remaining coherent and relevant to the prompt. Furthermore, Frankentexts pose a fundamental challenge to detectors of AI-generated text: 72% of Frankentexts produced by our best Gemini 2.5 Pro configuration are misclassified as human-written by Pangram, a state-of-the-art detector. Human annotators praise Frankentexts for their inventive premises, vivid descriptions, and dry humor; on the other hand, they identify issues with abrupt tonal shifts and uneven grammar across segments, particularly in longer pieces. The emergence of high-quality Frankentexts raises serious questions about authorship and copyright: when humans provide the raw materials and LLMs orchestrate them into new narratives, who truly owns the result?

Frankentext: Stitching random text fragments into long-form narratives

TL;DR

Frankentexts address long-form narrative generation under a strict verbatim-copy constraint by leveraging an LLM as a text-composer over a massive pool of human-written snippets. The approach uses a three-stage pipeline—draft generation with a fixed copy rate, minimal editing, and optional agent-assisted retrieval via MCP—to implicitly navigate a combinatorial space of snippet arrangements. Across multiple models and dataset configurations, Frankentexts achieve higher writing quality and diversity than vanilla baselines while often evading automated detectors, though they incur high computational cost and raise authorship and copyright questions. The work also provides detailed metrics, datasets, and token-level annotations to aid future mixed-authorship detection and provenance research, positioning Frankentexts as a challenging but informative test bed for perception, detection, and policy discussions around AI-assisted writing.

Abstract

We introduce Frankentexts, a long-form narrative generation paradigm that treats an LLM as a composer of existing texts rather than as an author. Given a writing prompt and thousands of randomly sampled human-written snippets, the model is asked to produce a narrative under the extreme constraint that most tokens (e.g., 90%) must be copied verbatim from the provided paragraphs. This task is effectively intractable for humans: selecting and ordering snippets yields a combinatorial search space that an LLM implicitly explores, before minimally editing and stitching together selected fragments into a coherent long-form story. Despite the extreme challenge of the task, we observe through extensive automatic and human evaluation that Frankentexts significantly improve over vanilla LLM generations in terms of writing quality, diversity, and originality while remaining coherent and relevant to the prompt. Furthermore, Frankentexts pose a fundamental challenge to detectors of AI-generated text: 72% of Frankentexts produced by our best Gemini 2.5 Pro configuration are misclassified as human-written by Pangram, a state-of-the-art detector. Human annotators praise Frankentexts for their inventive premises, vivid descriptions, and dry humor; on the other hand, they identify issues with abrupt tonal shifts and uneven grammar across segments, particularly in longer pieces. The emergence of high-quality Frankentexts raises serious questions about authorship and copyright: when humans provide the raw materials and LLMs orchestrate them into new narratives, who truly owns the result?

Paper Structure

This paper contains 64 sections, 26 figures, 14 tables, 1 algorithm.

Figures (26)

  • Figure 1: The Frankentexts pipeline. First, random paragraphs are sampled from a large corpus of human-written books. Then, an LLM is prompted with the paragraphs, a writing prompt, and instructions to include a certain amount of human text verbatim, to generate the first draft of a Frankentext, which is further edited into a coherent and faithful final version (see Algorithm \ref{['pseudocode']}).
  • Figure 2: Average human ratings on a Likert scale from 1 to 7 for vanilla generations versus Frankentexts + 5K. Frankentexts achieve higher scores across all dimensions.
  • Figure 3: Effects of varying the percentage of required verbatim copy on the Pangram AI detection rate (mixed, highly likely, and likely AI labels), copy rate, or coherence of the Frankentexts.
  • Figure 4: Example of the consent form provided to participants.
  • Figure 5: Label Studio Single Story Annotation Interface
  • ...and 21 more figures