Table of Contents
Fetching ...

Narrative Fingerprints: Multi-Scale Author Identification via Novelty Curve Dynamics

Fred Zimmerman, Hilmar AI

Abstract

We test whether authors have characteristic "fingerprints" in the information-theoretic novelty curves of their published works. Working with two corpora -- Books3 (52,796 books, 759 qualifying authors) and PG-19 (28,439 books, 1,821 qualifying authors) -- we find that authorial voice leaves measurable traces in how novelty unfolds across a text. The signal is multi-scale: at book level, scalar dynamics (mean novelty, speed, volume, circuitousness) identify 43% of authors significantly above chance; at chapter level, SAX motif patterns in sliding windows achieve 30x-above-chance attribution, far exceeding the scalar features that dominate at book level. These signals are complementary, not redundant. We show that the fingerprint is partly confounded with genre but persists within-genre for approximately one-quarter of authors. Classical authors (Twain, Austen, Kipling) show fingerprints comparable in strength to modern authors, suggesting the phenomenon is not an artifact of contemporary publishing conventions.

Narrative Fingerprints: Multi-Scale Author Identification via Novelty Curve Dynamics

Abstract

We test whether authors have characteristic "fingerprints" in the information-theoretic novelty curves of their published works. Working with two corpora -- Books3 (52,796 books, 759 qualifying authors) and PG-19 (28,439 books, 1,821 qualifying authors) -- we find that authorial voice leaves measurable traces in how novelty unfolds across a text. The signal is multi-scale: at book level, scalar dynamics (mean novelty, speed, volume, circuitousness) identify 43% of authors significantly above chance; at chapter level, SAX motif patterns in sliding windows achieve 30x-above-chance attribution, far exceeding the scalar features that dominate at book level. These signals are complementary, not redundant. We show that the fingerprint is partly confounded with genre but persists within-genre for approximately one-quarter of authors. Classical authors (Twain, Austen, Kipling) show fingerprints comparable in strength to modern authors, suggesting the phenomenon is not an artifact of contemporary publishing conventions.

Paper Structure

This paper contains 28 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Distribution of author fingerprint effect sizes (Books3, baseline SAX). Red bars indicate statistically significant authors ($p < 0.05$). The distribution is right-skewed, with a long tail of strongly fingerprinted authors.
  • Figure 2: Resolution scaling (Experiment 2). Left: significance rate and mean effect size increase monotonically with PAA segments. Right: at PAA$=64$, increasing $k$-gram length trades consistency-test power for attribution accuracy.
  • Figure 3: Fisher Discriminant Ratios for scalar features vs. SAX motifs. The four scalar features (FDR 6.97--9.05) are 6--8$\times$ more discriminative than SAX motifs (avg. 1.13) at book level, explaining the dominance of scalars in Experiment 3.
  • Figure 4: Multi-scale comparison of attribution performance ($\times$ above chance). At book level (blue band), scalar dynamics dominate at 29$\times$. At window level (green band), SAX motifs dominate at 30.5$\times$. The scale inversion is the paper's central finding.
  • Figure 5: Genre disentangling: within-cluster fingerprint rates. The overall 13.6% rate drops to 7.6% in formulaic clusters but rises to 25% in literary clusters, confirming that a substantial fraction of fingerprints survive genre control.
  • ...and 1 more figures