Table of Contents
Fetching ...

Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification

Tuc Nguyen, Yifan Hu, Thai Le

TL;DR

The paper presents the first unified framework to study the interdependent roles of authorship obfuscation, mimicking, and verification in the context of large language models. It formalizes isolation, pairwise, and triplet-wise interdependencies, and evaluates them across multiple LLMs, datasets, and metadata conditions. Key findings show obfuscation generally disrupts author signals, while mimicking can partially recover stylistic traits over time; demographic metadata enhances verification and impersonation capabilities, increasing privacy risk for well-known individuals. The results underscore the dual-use nature of LLMs, emphasizing the need for robust detection, privacy-aware tooling, and transparent handling of metadata in authorship tasks.

Abstract

Recent advancements in large language models (LLMs) have been fueled by large scale training corpora drawn from diverse sources such as websites, news articles, and books. These datasets often contain explicit user information, such as person names and addresses, that LLMs may unintentionally reproduce in their generated outputs. Beyond such explicit content, LLMs can also leak identity revealing cues through implicit signals such as distinctive writing styles, raising significant concerns about authorship privacy. There are three major automated tasks in authorship privacy, namely authorship obfuscation (AO), authorship mimicking (AM), and authorship verification (AV). Prior research has studied AO, AM, and AV independently. However, their interplays remain under explored, which leaves a major research gap, especially in the era of LLMs, where they are profoundly shaping how we curate and share user generated content, and the distinction between machine generated and human authored text is also increasingly blurred. This work then presents the first unified framework for analyzing the dynamic relationships among LLM enabled AO, AM, and AV in the context of authorship privacy. We quantify how they interact with each other to transform human authored text, examining effects at a single point in time and iteratively over time. We also examine the role of demographic metadata, such as gender, academic background, in modulating their performances, inter-task dynamics, and privacy risks. All source code will be publicly available.

Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification

TL;DR

The paper presents the first unified framework to study the interdependent roles of authorship obfuscation, mimicking, and verification in the context of large language models. It formalizes isolation, pairwise, and triplet-wise interdependencies, and evaluates them across multiple LLMs, datasets, and metadata conditions. Key findings show obfuscation generally disrupts author signals, while mimicking can partially recover stylistic traits over time; demographic metadata enhances verification and impersonation capabilities, increasing privacy risk for well-known individuals. The results underscore the dual-use nature of LLMs, emphasizing the need for robust detection, privacy-aware tooling, and transparent handling of metadata in authorship tasks.

Abstract

Recent advancements in large language models (LLMs) have been fueled by large scale training corpora drawn from diverse sources such as websites, news articles, and books. These datasets often contain explicit user information, such as person names and addresses, that LLMs may unintentionally reproduce in their generated outputs. Beyond such explicit content, LLMs can also leak identity revealing cues through implicit signals such as distinctive writing styles, raising significant concerns about authorship privacy. There are three major automated tasks in authorship privacy, namely authorship obfuscation (AO), authorship mimicking (AM), and authorship verification (AV). Prior research has studied AO, AM, and AV independently. However, their interplays remain under explored, which leaves a major research gap, especially in the era of LLMs, where they are profoundly shaping how we curate and share user generated content, and the distinction between machine generated and human authored text is also increasingly blurred. This work then presents the first unified framework for analyzing the dynamic relationships among LLM enabled AO, AM, and AV in the context of authorship privacy. We quantify how they interact with each other to transform human authored text, examining effects at a single point in time and iteratively over time. We also examine the role of demographic metadata, such as gender, academic background, in modulating their performances, inter-task dynamics, and privacy risks. All source code will be publicly available.

Paper Structure

This paper contains 29 sections, 9 equations, 3 figures, 21 tables.

Figures (3)

  • Figure 1: The interactive influence loop between LLMs, obfuscation, mimicking, and verification.
  • Figure 2: We present an overall pairwise interdependency evaluation of each LLM across the tasks of AO, AM, and AV. For each aspect, the final score is computed as the average across two "judge" evaluations to enable relative comparison.
  • Figure 3: Verification accuracy ($\uparrow$), KL ($\downarrow$), and Human-likeness scores of mimicked and obfuscated texts compared to original texts across datasets, both with and without metadata. The x-axis represents the step order, ranging from 1 to 10 for 5 iterations alternating between AM$\rightarrow$AO$\rightarrow$AM$\rightarrow$...$\rightarrow$AO. AV is used as an intermediate step after AO and does not generate any texts, so we hide it for clarity. We refer to Table \ref{['detail_loop_verification']} for the detailed results.