Table of Contents
Fetching ...

JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models

Jillian Fisher, Ximing Lu, Jaehun Jung, Liwei Jiang, Zaid Harchaoui, Yejin Choi

TL;DR

JamDec tackles the privacy risk of authorship attribution by offering an unsupervised inference-time obfuscation method that uses small, open-source language models (e.g., GPT2-XL). It frames obfuscation as constrained decoding guided by automatically extracted keywords, followed by filtering to preserve meaning and fluency. The approach, comprising keyword extraction, constrained generation via Constrained Diverse Beam Search, and post-generation filtering, achieves obfuscation performance competitive with much larger models like GPT-3.5 while maintaining content fidelity, demonstrated on scholarly and diary-like datasets. The work advances practical, privacy-preserving tools for online writing and blind reviews with accessible model sizes.

Abstract

The permanence of online content combined with the enhanced authorship identification techniques calls for stronger computational methods to protect the identity and privacy of online authorship when needed, e.g., blind reviews for scientific papers, anonymous online reviews, or anonymous interactions in the mental health forums. In this paper, we propose an unsupervised inference-time approach to authorship obfuscation to address the unique challenges of authorship obfuscation: lack of supervision data for diverse authorship and domains, and the need for a sufficient level of revision beyond simple paraphrasing to obfuscate the authorship, all the while preserving the original content and fluency. We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation that can be in principle applied to any text and authorship. Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs, while also reducing the performance gap between small and large language models via algorithmic enhancement. The key idea behind our approach is to boost the creative power of smaller language models through constrained decoding, while also allowing for user-specified controls and flexibility. Experimental results demonstrate that our approach based on GPT2-XL outperforms previous state-of-the-art methods based on comparably small models, while performing competitively against GPT3.5 175B, a propriety model that is two orders of magnitudes larger.

JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models

TL;DR

JamDec tackles the privacy risk of authorship attribution by offering an unsupervised inference-time obfuscation method that uses small, open-source language models (e.g., GPT2-XL). It frames obfuscation as constrained decoding guided by automatically extracted keywords, followed by filtering to preserve meaning and fluency. The approach, comprising keyword extraction, constrained generation via Constrained Diverse Beam Search, and post-generation filtering, achieves obfuscation performance competitive with much larger models like GPT-3.5 while maintaining content fidelity, demonstrated on scholarly and diary-like datasets. The work advances practical, privacy-preserving tools for online writing and blind reviews with accessible model sizes.

Abstract

The permanence of online content combined with the enhanced authorship identification techniques calls for stronger computational methods to protect the identity and privacy of online authorship when needed, e.g., blind reviews for scientific papers, anonymous online reviews, or anonymous interactions in the mental health forums. In this paper, we propose an unsupervised inference-time approach to authorship obfuscation to address the unique challenges of authorship obfuscation: lack of supervision data for diverse authorship and domains, and the need for a sufficient level of revision beyond simple paraphrasing to obfuscate the authorship, all the while preserving the original content and fluency. We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation that can be in principle applied to any text and authorship. Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs, while also reducing the performance gap between small and large language models via algorithmic enhancement. The key idea behind our approach is to boost the creative power of smaller language models through constrained decoding, while also allowing for user-specified controls and flexibility. Experimental results demonstrate that our approach based on GPT2-XL outperforms previous state-of-the-art methods based on comparably small models, while performing competitively against GPT3.5 175B, a propriety model that is two orders of magnitudes larger.
Paper Structure (42 sections, 7 equations, 12 figures, 16 tables, 2 algorithms)

This paper contains 42 sections, 7 equations, 12 figures, 16 tables, 2 algorithms.

Figures (12)

  • Figure 1: JamDec framework.
  • Figure 2: Highlighting the trade-offs between obfuscation (Drop Rate (ENS)), content preservation (NLI), and language quality (CoLA) of each method for the AMT-10 and BLOG-10 datasets. The dotted line indicates the trend through all methods.
  • Figure 3: Qualitative examples of obfuscated text created by each method. The sentences are taken from the AMT-3 dataset. Changes to the original are outline in blue (correct grammatically and in context) and red (incorrect grammatically or out of context).
  • Figure 4: Human Evaluation on 102 random samples from AMT-3. We include two versions of our method with differing filtering stages (with and without Stylo).
  • Figure 5: Human Evaluation on 102 random samples from AMT-3. We include two versions of JamDec+Stylo, the original that uses a final CoLA threshold (JamDec+Stylo+W/_Threshold) and one that does not use this threshold (JamDec+Stylo+W/O_Threshold).
  • ...and 7 more figures