Table of Contents
Fetching ...

ALISON: Fast and Effective Stylometric Authorship Obfuscation

Eric Xing, Saranya Venkatraman, Thai Le, Dongwon Lee

TL;DR

ALISON addresses the need for practical, fast, and interpretable authorship obfuscation in the face of strong transformer-based AA methods. It combines a one-time, POS-informed internal classifier with a masked-language-model-based phrase replacement and a one-gram-at-a-time obfuscation process to achieve over tenfold speed improvements and about 15% better obfuscation performance across multiple datasets and target models, while preserving semantics. The approach offers transparency through interpretable stylometric features and demonstrates effectiveness against ChatGPT-generated texts, reducing detection by several detectors with minimal semantic degradation. This work provides a reproducible, scalable solution for privacy-preserving text editing in realistic blind-attack settings and highlights practical considerations for ethics and deployment.

Abstract

Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research. Modern AA leverages an author's consistent writing style to match a text to its author using an AA classifier. AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its authorship. To address privacy concerns raised by state-of-the-art (SOTA) AA methods, new AO methods have been proposed but remain largely impractical to use due to their prohibitively slow training and obfuscation speed, often taking hours. To this challenge, we propose a practical AO method, ALISON, that (1) dramatically reduces training/obfuscation time, demonstrating more than 10x faster obfuscation than SOTA AO methods, (2) achieves better obfuscation success through attacking three transformer-based AA methods on two benchmark datasets, typically performing 15% better than competing methods, (3) does not require direct signals from a target AA classifier during obfuscation, and (4) utilizes unique stylometric features, allowing sound model interpretation for explainable obfuscation. We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts, all while minimally changing the original text semantics. To ensure the reproducibility of our findings, our code and data are available at: https://github.com/EricX003/ALISON.

ALISON: Fast and Effective Stylometric Authorship Obfuscation

TL;DR

ALISON addresses the need for practical, fast, and interpretable authorship obfuscation in the face of strong transformer-based AA methods. It combines a one-time, POS-informed internal classifier with a masked-language-model-based phrase replacement and a one-gram-at-a-time obfuscation process to achieve over tenfold speed improvements and about 15% better obfuscation performance across multiple datasets and target models, while preserving semantics. The approach offers transparency through interpretable stylometric features and demonstrates effectiveness against ChatGPT-generated texts, reducing detection by several detectors with minimal semantic degradation. This work provides a reproducible, scalable solution for privacy-preserving text editing in realistic blind-attack settings and highlights practical considerations for ethics and deployment.

Abstract

Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research. Modern AA leverages an author's consistent writing style to match a text to its author using an AA classifier. AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its authorship. To address privacy concerns raised by state-of-the-art (SOTA) AA methods, new AO methods have been proposed but remain largely impractical to use due to their prohibitively slow training and obfuscation speed, often taking hours. To this challenge, we propose a practical AO method, ALISON, that (1) dramatically reduces training/obfuscation time, demonstrating more than 10x faster obfuscation than SOTA AO methods, (2) achieves better obfuscation success through attacking three transformer-based AA methods on two benchmark datasets, typically performing 15% better than competing methods, (3) does not require direct signals from a target AA classifier during obfuscation, and (4) utilizes unique stylometric features, allowing sound model interpretation for explainable obfuscation. We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts, all while minimally changing the original text semantics. To ensure the reproducibility of our findings, our code and data are available at: https://github.com/EricX003/ALISON.
Paper Structure (18 sections, 6 figures, 4 tables)

This paper contains 18 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: ALISON successfully obfuscating a text by changing its style while preserving semantics.
  • Figure 2: ALISON: Our proposed obfuscation pipeline.
  • Figure 3: An example of extracting POS trigrams.
  • Figure 4: Distribution of author-wise contributions to label entropy post-obfuscation.
  • Figure 5: Effect of varying $L$ on obfuscation success and semantic preservation
  • ...and 1 more figures