Table of Contents
Fetching ...

Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

Robert Dilworth

TL;DR

This work advances adversarial stylometry by evaluating TraceTarnish, a pipeline that combines machine translation, paraphrasing, and steganography to obfuscate authorial style in text. It introduces Injection as a fourth strategy and leverages StyloMetrix with Information Gain to identify five highly informative stylometric cues that indicate obfuscation. Through a Reddit-based dataset, the study demonstrates how function-word usage, content-word distributions, and Type-Token Ratio shift under anonymization, creating potential indicators of compromise and forensic beacons. It also proposes enhancements, including an offline self-hosted LLM for quality control and an adversarial imitation step, to improve text coherence while preserving privacy.

Abstract

In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments--comments that were later alchemized into $\textit{TraceTarnish}$ data--to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features--features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues--function-word frequencies, content-word distributions, and the Type-Token Ratio--serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.

Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

TL;DR

This work advances adversarial stylometry by evaluating TraceTarnish, a pipeline that combines machine translation, paraphrasing, and steganography to obfuscate authorial style in text. It introduces Injection as a fourth strategy and leverages StyloMetrix with Information Gain to identify five highly informative stylometric cues that indicate obfuscation. Through a Reddit-based dataset, the study demonstrates how function-word usage, content-word distributions, and Type-Token Ratio shift under anonymization, creating potential indicators of compromise and forensic beacons. It also proposes enhancements, including an offline self-hosted LLM for quality control and an adversarial imitation step, to improve text coherence while preserving privacy.

Abstract

In this study, we more rigorously evaluated our attack script , which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments--comments that were later alchemized into data--to gain valuable insights. The transformed data was then further augmented by to manufacture stylometric features--features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ( ); content words and content word types ( ); and the Type-Token Ratio () yielded significant Information-Gain readings. The identified stylometric cues--function-word frequencies, content-word distributions, and the Type-Token Ratio--serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed 's operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.

Paper Structure

This paper contains 21 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: An operational overview of TraceTarnish, wherein the attack passes a text-only message through a process that (1) round-trip translates it using machine translation, (2) obfuscates the text by paraphrasing, and (3) embeds noise via steganography.
  • Figure 2: Our dataset containing the inputs fed to and the outputs retrieved from TraceTarnish. Rows assigned a "0" indicate raw Reddit comments; rows assigned a "1" represent Reddit comments that have been anonymized via TraceTarnish. To eliminate the influence and subsequent variability of input text length, we ensured that the material passed to TraceTarnish was of a consistent and uniform length.
  • Figure 3: The Stylometrix vectors produced from our TraceTarnish data.
  • Figure 4: A collection of radar charts that visually represent the contents of (Table \ref{['tab:Top_StyloMetrix_Features_Raw_Readings']}), illustrating how the isolated StyloMetrix feature scores plummet when anonymizing text.
  • Figure 5: Visualizes the computed Burrows's Delta values for the first five pairs of anonymized and non-anonymized text samples using a dendrogram. The dendrogram labels are color-coded by groups--groups derived from our established naming convention. Here, a group of paired texts is denoted by their shared Roman-numeral prefix in their filenames. All odd filename suffix values, e.g., "001" in "I_data_point_001.txt," correspond to the "NANON" (non-anonymized) label; the even suffixes correspond with the "ANON" (anonymized) label. An observation that can be made is that, in general--excluding "V_data_point_009.txt"--there is an apparent divide between the "NANON" and "ANON" samples. The code used to generate the graph is courtesy of James O'Sullivan O'Sullivan2024.
  • ...and 3 more figures