Table of Contents
Fetching ...

Authorship Impersonation via LLM Prompting does not Evade Authorship Verification Methods

Baoyi Zeng, Andrea Nini

Abstract

Authorship verification (AV), the task of determining whether a questioned text was written by a specific individual, is a critical part of forensic linguistics. While manual authorial impersonation by perpetrators has long been a recognized threat in historical forensic cases, recent advances in large language models (LLMs) raise new challenges, as adversaries may exploit these tools to impersonate another's writing. This study investigates whether prompted LLMs can generate convincing authorial impersonations and whether such outputs can evade existing forensic AV systems. Using GPT-4o as the adversary model, we generated impersonation texts under four prompting conditions across three genres: emails, text messages, and social media posts. We then evaluated these outputs against both non-neural AV methods (n-gram tracing, Ranking-Based Impostors Method, LambdaG) and neural approaches (AdHominem, LUAR, STAR) within a likelihood-ratio framework. Results show that LLM-generated texts failed to sufficiently replicate authorial individuality to bypass established AV systems. We also observed that some methods achieved even higher accuracy when rejecting impersonation texts compared to genuine negative samples. Overall, these findings indicate that, despite the accessibility of LLMs, current AV systems remain robust against entry-level impersonation attempts across multiple genres. Furthermore, we demonstrate that this counter-intuitive resilience stems, at least in part, from the higher lexical diversity and entropy inherent in LLM-generated texts.

Authorship Impersonation via LLM Prompting does not Evade Authorship Verification Methods

Abstract

Authorship verification (AV), the task of determining whether a questioned text was written by a specific individual, is a critical part of forensic linguistics. While manual authorial impersonation by perpetrators has long been a recognized threat in historical forensic cases, recent advances in large language models (LLMs) raise new challenges, as adversaries may exploit these tools to impersonate another's writing. This study investigates whether prompted LLMs can generate convincing authorial impersonations and whether such outputs can evade existing forensic AV systems. Using GPT-4o as the adversary model, we generated impersonation texts under four prompting conditions across three genres: emails, text messages, and social media posts. We then evaluated these outputs against both non-neural AV methods (n-gram tracing, Ranking-Based Impostors Method, LambdaG) and neural approaches (AdHominem, LUAR, STAR) within a likelihood-ratio framework. Results show that LLM-generated texts failed to sufficiently replicate authorial individuality to bypass established AV systems. We also observed that some methods achieved even higher accuracy when rejecting impersonation texts compared to genuine negative samples. Overall, these findings indicate that, despite the accessibility of LLMs, current AV systems remain robust against entry-level impersonation attempts across multiple genres. Furthermore, we demonstrate that this counter-intuitive resilience stems, at least in part, from the higher lexical diversity and entropy inherent in LLM-generated texts.

Paper Structure

This paper contains 36 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: LLR scores produced by six authorship verification methods on LLM-generated impersonation texts under four prompting techniques, across three corpora: Enron (top), BOLT (middle), and Twitter (bottom). Bars indicate mean calibrated LLRs, with error bars representing 95% confidence intervals. Results for non-neural authorship verification methods (LambdaG, n-gram tracing, and RBI) are obtained with POSNoise preprocessing applied, while neural network–based methods (AdHominem, LUAR, and STAR) are evaluated on texts without POSNoising.
  • Figure 2: Performance degradation of authorship verification methods on LLM-generated impersonation texts compared to genuine test cases, averaged across four prompting conditions (naive, self-prompting, role-play, and tree-of-thoughts). Error bars represent 95% confidence intervals. Top: TNR Degradation. Values are computed as $(\text{TNR}_{\text{test}} - \text{TNR}_{\text{impersonation}}) / \text{TNR}_{\text{test}}$. Positive values represent the proportional loss of a method's original ability to correctly reject different-author pairs when facing generated impersonations, whereas negative values indicate improved performance when rejecting LLM-generated impersonations. Bottom: Confidence drop on TN cases, measured as the difference in averaged LLR magnitudes ($|\text{LLR}_{\text{test}}| - |\text{LLR}_{\text{impersonation}}|$). Since LLRs for true negative cases are inherently negative, this absolute difference quantifies the loss of evidential strength. Positive values indicate that a method becomes less confident (i.e., assigns weaker evidential support to the different-author hypothesis) when rejecting LLM-generated impersonations, whereas negative values would indicate increased confidence.
  • Figure 3: Comparison between LLM-generated impersonation texts and human-authored texts in terms of compressed size (top), entropy (middle), and TTR (bottom). Error bars indicate 95% confidence intervals. Note that the y-axes are scaled to the specific data range of each metric.