Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
Robert Dilworth
TL;DR
This paper investigates how Unicode zero-width steganography can augment adversarial stylometry to obscure authorship attribution, balancing privacy with potential misuse. It presents a multi-technique framework—combining imitation, translation, obfuscation, and zero-width steganography—embodied in the TraceTarnish pipeline, and provides a Python-based proof of principle for encoding hidden signals in text. The authors define evaluation metrics (soundness, safety, sensibility) and report preliminary results using Burrows' Delta to gauge attribution disruption, while discussing ethical, legal, and real-world implications. The work highlights the potential for privacy-enhancing mechanisms in digital communication and outlines concrete directions for future experiments, corpus design, and code dissemination.
Abstract
When using a public communication channel -- whether formal or informal, such as commenting or posting on social media -- end users have no expectation of privacy: they compose a message and broadcast it for the world to see. Even if an end user takes utmost precautions to anonymize their online presence -- using an alias or pseudonym; masking their IP address; spoofing their geolocation; concealing their operating system and user agent; deploying encryption; registering with a disposable phone number or email; disabling non-essential settings; revoking permissions; and blocking cookies and fingerprinting -- one obvious element still lingers: the message itself. Assuming they avoid lapses in judgment or accidental self-exposure, there should be little evidence to validate their actual identity, right? Wrong. The content of their message -- necessarily open for public consumption -- exposes an attack vector: stylometric analysis, or author profiling. In this paper, we dissect the technique of stylometry, discuss an antithetical counter-strategy in adversarial stylometry, and devise enhancements through Unicode steganography.
