Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

Robert Dilworth

Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

Robert Dilworth

TL;DR

This work advances adversarial stylometry by evaluating TraceTarnish, a pipeline that combines machine translation, paraphrasing, and steganography to obfuscate authorial style in text. It introduces Injection as a fourth strategy and leverages StyloMetrix with Information Gain to identify five highly informative stylometric cues that indicate obfuscation. Through a Reddit-based dataset, the study demonstrates how function-word usage, content-word distributions, and Type-Token Ratio shift under anonymization, creating potential indicators of compromise and forensic beacons. It also proposes enhancements, including an offline self-hosted LLM for quality control and an adversarial imitation step, to improve text coherence while preserving privacy.

Abstract

In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments--comments that were later alchemized into $\textit{TraceTarnish}$ data--to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features--features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues--function-word frequencies, content-word distributions, and the Type-Token Ratio--serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.

Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

TL;DR

Abstract

In this study, we more rigorously evaluated our attack script

, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments--comments that were later alchemized into

data--to gain valuable insights. The transformed

data was then further augmented by

to manufacture stylometric features--features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types (

); content words and content word types (

); and the Type-Token Ratio (

) yielded significant Information-Gain readings. The identified stylometric cues--function-word frequencies, content-word distributions, and the Type-Token Ratio--serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed

's operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.

Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

TL;DR

Abstract

Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)