Table of Contents
Fetching ...

Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation

Silvia Corbara, Alejandro Moreo

TL;DR

Experimental results reveal that, although the methodology proves effective in many adversarial settings, its benefits are too sporadic for a pragmatical application.

Abstract

Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else. It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author. In this paper, we investigate the potential benefits of augmenting the classifier training set with (negative) synthetic examples. These synthetic examples are generated to imitate the style of the author of interest. We analyze the improvements in classifier prediction that this augmentation brings to bear in the task of AV in an adversarial setting. In particular, we experiment with three different generator architectures (one based on Recurrent Neural Networks, another based on small-scale transformers, and another based on the popular GPT model) and with two training strategies (one inspired by standard Language Models, and another inspired by Wasserstein Generative Adversarial Networks). We evaluate our hypothesis on five datasets (three of which have been specifically collected to represent an adversarial setting) and using two learning algorithms for the AV classifier (Support Vector Machines and Convolutional Neural Networks). This experimentation has yielded negative results, revealing that, although our methodology proves effective in many adversarial settings, its benefits are too sporadic for a pragmatical application.

Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation

TL;DR

Experimental results reveal that, although the methodology proves effective in many adversarial settings, its benefits are too sporadic for a pragmatical application.

Abstract

Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else. It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author. In this paper, we investigate the potential benefits of augmenting the classifier training set with (negative) synthetic examples. These synthetic examples are generated to imitate the style of the author of interest. We analyze the improvements in classifier prediction that this augmentation brings to bear in the task of AV in an adversarial setting. In particular, we experiment with three different generator architectures (one based on Recurrent Neural Networks, another based on small-scale transformers, and another based on the popular GPT model) and with two training strategies (one inspired by standard Language Models, and another inspired by Wasserstein Generative Adversarial Networks). We evaluate our hypothesis on five datasets (three of which have been specifically collected to represent an adversarial setting) and using two learning algorithms for the AV classifier (Support Vector Machines and Convolutional Neural Networks). This experimentation has yielded negative results, revealing that, although our methodology proves effective in many adversarial settings, its benefits are too sporadic for a pragmatical application.
Paper Structure (18 sections, 2 equations, 3 figures, 4 tables)

This paper contains 18 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Upper: Flowchart of a standard AV method. Bottom: Flowchart of our proposed AV method, where representative examples of forgery are added to $\overline{A}$.
  • Figure 2: The CNN classifier architecture. The dotted lines represent alternative branches.
  • Figure 3: Plots of different datasets for one randomly chosen author per dataset. In each plot, we display the examples by $A$ in training and test, the examples by the others authors in training and test, and finally the examples created by the generator ($Fake$ in training). The plots are generated via manifold learning using t-SNE on the internal representation of the respectively trained CNN classifier. We also show the cosine distance between i) the centroid of all the examples by $A$ ($A$ centroid) and the centroid of all the documents by the authors in $\overline{A}$ ($\overline{A}$ centroid), and ii) the cosine distance between the centroid of all the examples by $A$ and the centroid of the generated examples combined with all the examples by $\overline{A}$ ($\overline{A}+Fake$ centroid).