Table of Contents
Fetching ...

StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

Jillian Fisher, Skyler Hallinan, Ximing Lu, Mitchell Gordon, Zaid Harchaoui, Yejin Choi

TL;DR

This paper tackles the challenge of authorship obfuscation by introducing StyleRemix, an interpretable, efficient approach that perturbs fine-grained author style elements via Low Rank Adaptation (LoRA) adapters. It presents a two-stage workflow: Stage 1 distills author invariants into 16 style-direction axes, and Stage 2 obfuscates by selecting axes and merging corresponding adapters with controllable weights. The authors release AuthorMix, a large, diverse corpus, and DiSC, a parallel dataset spanning seven style axes, to support evaluation and future research. Empirical results show StyleRemix outperforms strong baselines and even larger LLMs across four domains, with human evaluations confirming higher obfuscation and overall quality, while maintaining content preservation and fluency. The work also emphasizes interpretability and controllability, providing practical resource releases to advance stylistic obfuscation research while acknowledging ethical considerations.

Abstract

Authorship obfuscation, rewriting a text to intentionally obscure the identity of the author, is an important but challenging task. Current methods using large language models (LLMs) lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall. To address this, we develop StyleRemix, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. StyleRemix uses pre-trained Low Rank Adaptation (LoRA) modules to rewrite an input specifically along various stylistic axes (e.g., formality and length) while maintaining low computational cost. StyleRemix outperforms state-of-the-art baselines and much larger LLMs in a variety of domains as assessed by both automatic and human evaluation. Additionally, we release AuthorMix, a large set of 30K high-quality, long-form texts from a diverse set of 14 authors and 4 domains, and DiSC, a parallel corpus of 1,500 texts spanning seven style axes in 16 unique directions

StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

TL;DR

This paper tackles the challenge of authorship obfuscation by introducing StyleRemix, an interpretable, efficient approach that perturbs fine-grained author style elements via Low Rank Adaptation (LoRA) adapters. It presents a two-stage workflow: Stage 1 distills author invariants into 16 style-direction axes, and Stage 2 obfuscates by selecting axes and merging corresponding adapters with controllable weights. The authors release AuthorMix, a large, diverse corpus, and DiSC, a parallel dataset spanning seven style axes, to support evaluation and future research. Empirical results show StyleRemix outperforms strong baselines and even larger LLMs across four domains, with human evaluations confirming higher obfuscation and overall quality, while maintaining content preservation and fluency. The work also emphasizes interpretability and controllability, providing practical resource releases to advance stylistic obfuscation research while acknowledging ethical considerations.

Abstract

Authorship obfuscation, rewriting a text to intentionally obscure the identity of the author, is an important but challenging task. Current methods using large language models (LLMs) lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall. To address this, we develop StyleRemix, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. StyleRemix uses pre-trained Low Rank Adaptation (LoRA) modules to rewrite an input specifically along various stylistic axes (e.g., formality and length) while maintaining low computational cost. StyleRemix outperforms state-of-the-art baselines and much larger LLMs in a variety of domains as assessed by both automatic and human evaluation. Additionally, we release AuthorMix, a large set of 30K high-quality, long-form texts from a diverse set of 14 authors and 4 domains, and DiSC, a parallel corpus of 1,500 texts spanning seven style axes in 16 unique directions
Paper Structure (78 sections, 5 equations, 7 figures, 13 tables)

This paper contains 78 sections, 5 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: Overview of $\,$$\,\,$StyleRemix. In pre-obfuscation, distinct style elements are distilled from an LM into individual training sets, which are used to train specialized LoRA adapters. During obfuscation, the user can automatically or manually select the style adapter(s) which, when combined with the base LM, will best steer generations away from the original style.
  • Figure 2: We compare generations from rewriting a text from AuthorMix-Speech using each of the style axis adapters individually. This demonstrates the distinct transformation capabilities of each adapter, highlighting variations in tone, formality, and other linguistic features. We choose the direction of the style axes based on the automatic style selection method described in \ref{['sect:stage_2_obf']}.
  • Figure 3: Human evaluation results for mean grammar, fluency, content preserved, less content added, and obfuscation. For each of the metrics, higher is better. We also compute the mean overall score, the product of grammar, content preserved, and less style similarity.
  • Figure 4: Base model merging with random styles, n = 3
  • Figure 5: Seq. shuffle n = 3
  • ...and 2 more figures