Table of Contents
Fetching ...

Reproduction and Replication of an Adversarial Stylometry Experiment

Haining Wang, Patrick Juola, Allen Riddell

TL;DR

This paper reproduces and replicates experiments from a seminal study of defenses against authorship attribution and finds evidence suggesting that an entirely automatic method, round-trip translation, warrants re-examination because it appears to reduce the effectiveness of established authorship attribution methods.

Abstract

Maintaining anonymity in natural language communication remains a challenging task. Even when the number of candidate authors is large, standard authorship attribution techniques that analyze writing style predict the original author with uncomfortably high accuracy. Adversarial stylometry provides a defense against authorship attribution, helping users avoid unwanted deanonymization. This paper reproduces and replicates experiments from a seminal study of defenses against authorship attribution (Brennan et al., 2012). After reproducing the experiment using the original data, we then replicate the experiment by repeating the online field experiment using the procedures described in the original paper. Although we reach the same conclusion as the original paper, our results suggest that the defenses studied may be overstated in their effectiveness. This is largely due to the absence of a control group in the original study. In our replication, we find evidence suggesting that an entirely automatic method, round-trip translation, warrants re-examination because it appears to reduce the effectiveness of established authorship attribution methods.

Reproduction and Replication of an Adversarial Stylometry Experiment

TL;DR

This paper reproduces and replicates experiments from a seminal study of defenses against authorship attribution and finds evidence suggesting that an entirely automatic method, round-trip translation, warrants re-examination because it appears to reduce the effectiveness of established authorship attribution methods.

Abstract

Maintaining anonymity in natural language communication remains a challenging task. Even when the number of candidate authors is large, standard authorship attribution techniques that analyze writing style predict the original author with uncomfortably high accuracy. Adversarial stylometry provides a defense against authorship attribution, helping users avoid unwanted deanonymization. This paper reproduces and replicates experiments from a seminal study of defenses against authorship attribution (Brennan et al., 2012). After reproducing the experiment using the original data, we then replicate the experiment by repeating the online field experiment using the procedures described in the original paper. Although we reach the same conclusion as the original paper, our results suggest that the defenses studied may be overstated in their effectiveness. This is largely due to the absence of a control group in the original study. In our replication, we find evidence suggesting that an entirely automatic method, round-trip translation, warrants re-examination because it appears to reduce the effectiveness of established authorship attribution methods.
Paper Structure (21 sections, 2 figures, 3 tables)

This paper contains 21 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Reproduction of the adversarial stylometry experiment in brennan2012adversarial. The left panel shows the accuracy of a 10-fold cross-validation using the training data. The middle and right panels indicate the accuracy of the classifier when the indicated authorship attribution circumvention strategy is used. Each bar indicates the mean accuracy; error bars show 95% confidence intervals. 10-fold cross-validation is not performed with the RoBERTa model.
  • Figure 2: Replication of the adversarial stylometry experiment in brennan2012adversarial using the Riddell-Juola corpus. The top panels show classifier accuracy on the training data measured using 10-fold cross-validation, the obfuscation strategy, and the imitation strategy. The bottom-left panel shows classifier accuracy on writing from the control group. The bottom-middle and bottom-right panels show the accuracy of two round-trip translation strategies. Each bar indicates the mean accuracy with candidate sets randomly sampled 1,000 times (without replacement) from the pool of participants who used the indicated strategy.