Table of Contents
Fetching ...

Who Wrote the Book? Detecting and Attributing LLM Ghostwriters

Anudeex Shetty, Qiongkai Xu, Olga Ohrimenko, Jey Han Lau

Abstract

In this paper, we introduce GhostWriteBench, a dataset for LLM authorship attribution. It comprises long-form texts (50K+ words per book) generated by frontier LLMs, and is designed to test generalisation across multiple out-of-distribution (OOD) dimensions, including domain and unseen LLM author. We also propose TRACE -- a novel fingerprinting method that is interpretable and lightweight -- that works for both open- and closed-source models. TRACE creates the fingerprint by capturing token-level transition patterns (e.g., word rank) estimated by another lightweight language model. Experiments on GhostWriteBench demonstrate that TRACE achieves state-of-the-art performance, remains robust in OOD settings, and works well in limited training data scenarios.

Who Wrote the Book? Detecting and Attributing LLM Ghostwriters

Abstract

In this paper, we introduce GhostWriteBench, a dataset for LLM authorship attribution. It comprises long-form texts (50K+ words per book) generated by frontier LLMs, and is designed to test generalisation across multiple out-of-distribution (OOD) dimensions, including domain and unseen LLM author. We also propose TRACE -- a novel fingerprinting method that is interpretable and lightweight -- that works for both open- and closed-source models. TRACE creates the fingerprint by capturing token-level transition patterns (e.g., word rank) estimated by another lightweight language model. Experiments on GhostWriteBench demonstrate that TRACE achieves state-of-the-art performance, remains robust in OOD settings, and works well in limited training data scenarios.

Paper Structure

This paper contains 49 sections, 6 equations, 17 figures, 16 tables, 2 algorithms.

Figures (17)

  • Figure 1: Overview of TRACE fingerprint construction. We model transitions of consecutive token scores (using another language model). The process is shown for both Rank-based (§\ref{['sec:rank-based-fingerprint']}) and Entropy-based (§\ref{['sec:entropy-based-fingerprint']}) fingerprint variants. We construct a pool of reference fingerprints from an LLM's training texts. At inference time, attribution is a straightforward comparison between the test fingerprint and the LLM reference fingerprint.
  • Figure 2: An example where TRACE captures similar gpt-4.1 fingerprints across domains, demonstrating generalisation. Left: a training reference fingerprint; Middle: a test sample from seen domain (HB); Right: a test sample from an unseen domain (SSP,LHH).
  • Figure 3: Normalised OOD-Author confusion matrix for Entropy-based TRACE variant using norm-mean. The remaining confusion matrices for other methods can be found in Appendix Figure \ref{['fig:conf-matrix-OOD-results']}.
  • Figure 4: The impact of using different fingerprint sizes for TRACE. The red dashed line denotes the selected configuration.
  • Figure 5: The impact of $\alpha$ in power law approximation for Rank-based TRACE. The red dashed line denotes the chosen $\alpha=1.5$.
  • ...and 12 more figures