StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements
Jillian Fisher, Skyler Hallinan, Ximing Lu, Mitchell Gordon, Zaid Harchaoui, Yejin Choi
TL;DR
This paper tackles the challenge of authorship obfuscation by introducing StyleRemix, an interpretable, efficient approach that perturbs fine-grained author style elements via Low Rank Adaptation (LoRA) adapters. It presents a two-stage workflow: Stage 1 distills author invariants into 16 style-direction axes, and Stage 2 obfuscates by selecting axes and merging corresponding adapters with controllable weights. The authors release AuthorMix, a large, diverse corpus, and DiSC, a parallel dataset spanning seven style axes, to support evaluation and future research. Empirical results show StyleRemix outperforms strong baselines and even larger LLMs across four domains, with human evaluations confirming higher obfuscation and overall quality, while maintaining content preservation and fluency. The work also emphasizes interpretability and controllability, providing practical resource releases to advance stylistic obfuscation research while acknowledging ethical considerations.
Abstract
Authorship obfuscation, rewriting a text to intentionally obscure the identity of the author, is an important but challenging task. Current methods using large language models (LLMs) lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall. To address this, we develop StyleRemix, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. StyleRemix uses pre-trained Low Rank Adaptation (LoRA) modules to rewrite an input specifically along various stylistic axes (e.g., formality and length) while maintaining low computational cost. StyleRemix outperforms state-of-the-art baselines and much larger LLMs in a variety of domains as assessed by both automatic and human evaluation. Additionally, we release AuthorMix, a large set of 30K high-quality, long-form texts from a diverse set of 14 authors and 4 domains, and DiSC, a parallel corpus of 1,500 texts spanning seven style axes in 16 unique directions
