Table of Contents
Fetching ...

Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated?

Ajay Patel, Nicholas Andrews, Chris Callison-Burch

TL;DR

The results establish an in-context learning technique the authors develop as the strongest baseline for authorship style transfer, though it is found current approaches do not yet achieve mastery of this challenging task.

Abstract

Authorship style transfer involves altering text to match the style of a target author whilst preserving the original meaning. Existing unsupervised approaches like STRAP have largely focused on style transfer to target authors with many examples of their writing style in books, speeches, or other published works. This high-resource training data requirement (often greater than 100,000 words) makes these approaches primarily useful for style transfer to published authors, politicians, or other well-known figures and authorship styles, while style transfer to non-famous authors has not been well-studied. We introduce the low-resource authorship style transfer task, a more challenging class of authorship style transfer where only a limited amount of text in the target author's style may exist. In our experiments, we specifically choose source and target authors from Reddit and style transfer their Reddit posts, limiting ourselves to just 16 posts (on average ~500 words) of the target author's style. Style transfer accuracy is typically measured by how often a classifier or human judge will classify an output as written by the target author. Recent authorship representations models excel at authorship identification even with just a few writing samples, making automatic evaluation of this task possible for the first time through evaluation metrics we propose. Our results establish an in-context learning technique we develop as the strongest baseline, though we find current approaches do not yet achieve mastery of this challenging task. We release our data and implementations to encourage further investigation.

Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated?

TL;DR

The results establish an in-context learning technique the authors develop as the strongest baseline for authorship style transfer, though it is found current approaches do not yet achieve mastery of this challenging task.

Abstract

Authorship style transfer involves altering text to match the style of a target author whilst preserving the original meaning. Existing unsupervised approaches like STRAP have largely focused on style transfer to target authors with many examples of their writing style in books, speeches, or other published works. This high-resource training data requirement (often greater than 100,000 words) makes these approaches primarily useful for style transfer to published authors, politicians, or other well-known figures and authorship styles, while style transfer to non-famous authors has not been well-studied. We introduce the low-resource authorship style transfer task, a more challenging class of authorship style transfer where only a limited amount of text in the target author's style may exist. In our experiments, we specifically choose source and target authors from Reddit and style transfer their Reddit posts, limiting ourselves to just 16 posts (on average ~500 words) of the target author's style. Style transfer accuracy is typically measured by how often a classifier or human judge will classify an output as written by the target author. Recent authorship representations models excel at authorship identification even with just a few writing samples, making automatic evaluation of this task possible for the first time through evaluation metrics we propose. Our results establish an in-context learning technique we develop as the strongest baseline, though we find current approaches do not yet achieve mastery of this challenging task. We release our data and implementations to encourage further investigation.
Paper Structure (26 sections, 4 equations, 2 figures, 3 tables)

This paper contains 26 sections, 4 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: An actual output of Styll on the unsupervised low-resource authorship style transfer task between two Reddit users using just 16 Reddit posts as examples of the target style.
  • Figure 2: Scores of various evaluation metrics and a joint score, $J(\textsc{a}, \textsc{s}, \textsc{f})$, on style transfer outputs produced by $\textsc{Strap}_{p=0.0}$strap on the Shakespeare author imitation dataset shakespeare given decreasing amounts of training example tokens. Strap's performance falls off as the number of training tokens decreases and drops precipitously in the low-resource setting.