Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation

Xinguo Feng; Zhongkui Ma; Zihan Wang; Alsharif Abuadbba; Guangdong Bai

Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation

Xinguo Feng, Zhongkui Ma, Zihan Wang, Alsharif Abuadbba, Guangdong Bai

TL;DR

The paper tackles privacy risks in collaborative language-model training by gradient inversion attacks that recover private data from shared gradients. It introduces GHOST, a token-level defense that obfuscates original tokens via shadow tokens while preserving embedding- and gradient-space utility, leveraging a two-stage process: searching for semantically distinct but embedding-proximate shadows and selecting optimal shadows to minimize disruptions to model outputs. The authors provide a formal analysis showing that utility loss scales gently $L(\mathbf{x}; \tilde{\boldsymbol{\theta}}) - L(\tilde{\mathbf{x}}; \tilde{\boldsymbol{\theta}}) = O(\epsilon)$ while gradient leakage is suppressed with $\| \nabla_{\tilde{\boldsymbol{\theta}}} L(\mathbf{x}; \tilde{\boldsymbol{\theta}}) - \nabla_{\tilde{\boldsymbol{\theta}}} L(\tilde{\mathbf{x}}; \tilde{\boldsymbol{\theta}}) \| = O(1)$ as $\epsilon \to 0$. Empirically, Ghost achieves strong privacy protection (token-recovery rates near 1-2%) and preserves utility across classification and generation tasks (e.g., classification F1 up to 0.92; perplexity down to 5.45) across diverse models (BERT, Llama, Gemma) and datasets, including resilience to adaptive GIAs. Comparisons with gradient-noise and gradient-pruning baselines show Ghost delivers the best privacy-utility balance, underscoring the value of token-level obfuscation. The work suggests a paradigm shift from gradient-centric defenses to space-decoupled token-level strategies for privacy in collaborative learning with large language models.

Abstract

Training and fine-tuning large-scale language models largely benefit from collaborative learning, but the approach has been proven vulnerable to gradient inversion attacks (GIAs), which allow adversaries to reconstruct private training data from shared gradients. Existing defenses mainly employ gradient perturbation techniques, e.g., noise injection or gradient pruning, to disrupt GIAs' direct mapping from gradient space to token space. However, these methods often fall short due to the retention of semantics similarity across gradient, embedding, and token spaces. In this work, we propose a novel defense mechanism named GHOST (gradient shield with obfuscated tokens), a token-level obfuscation mechanism that neutralizes GIAs by decoupling the inherent connections across gradient, embedding, and token spaces. GHOST is built upon an important insight: due to the large scale of the token space, there exist semantically distinct yet embedding-proximate tokens that can serve as the shadow substitutes of the original tokens, which enables a semantic disconnection in the token space while preserving the connection in the embedding and gradient spaces. GHOST comprises a searching step, which identifies semantically distinct candidate tokens using a multi-criteria searching process, and a selection step, which selects optimal shadow tokens to ensure minimal disruption to features critical for training by preserving alignment with the internal outputs produced by original tokens. Evaluation across diverse model architectures (from BERT to Llama) and datasets demonstrates the remarkable effectiveness of GHOST in protecting privacy (as low as 1% in recovery rate) and preserving utility (up to 0.92 in classification F1 and 5.45 in perplexity), in both classification and generation tasks against state-of-the-art GIAs and adaptive attack scenarios.

Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation

TL;DR

while gradient leakage is suppressed with

. Empirically, Ghost achieves strong privacy protection (token-recovery rates near 1-2%) and preserves utility across classification and generation tasks (e.g., classification F1 up to 0.92; perplexity down to 5.45) across diverse models (BERT, Llama, Gemma) and datasets, including resilience to adaptive GIAs. Comparisons with gradient-noise and gradient-pruning baselines show Ghost delivers the best privacy-utility balance, underscoring the value of token-level obfuscation. The work suggests a paradigm shift from gradient-centric defenses to space-decoupled token-level strategies for privacy in collaborative learning with large language models.

Abstract

Paper Structure (41 sections, 2 theorems, 13 equations, 4 figures, 7 tables, 2 algorithms)

This paper contains 41 sections, 2 theorems, 13 equations, 4 figures, 7 tables, 2 algorithms.

Introduction
Background
Language Modeling
Tokens and Embeddings
Gradient Inversion Attacks
Approach
Threat Model
Overview of Ghost
Searching
Selection
Theoretical Analysis
Assumptions
Model Utility Preservation and Defense against GIAs
Model Utility Preservation
Effective Defense against GIAs
...and 26 more sections

Key Result

Theorem 1

Given the pre-trained model $L(\bm{x}; \bm{\theta})$ and the model $L(\bm{x}; \tilde{\bm{\theta}})$ fine-tuned on the obfuscated dataset $\widetilde{\mathcal{D}}$, for any pair of the original data point $\bm{x} \in \mathcal{D}$ and its obfuscated data point $\tilde{\bm{x}} \in \widetilde{\mathcal{D where $\epsilon$ is defined by Assumption assumption:small_update. More strictly, we have where $k

Figures (4)

Figure 1: Overview of Ghost.
Figure 2: Defense efficacy against GIAs on BERT models (the greater coverage, the stronger defense).
Figure 3: Upper Bounds of $\epsilon$.
Figure 4: Loss and gradient deviation comparison.

Theorems & Definitions (2)

Theorem 1: Model utility preservation
Theorem 2: Effective defense against GIAs

Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation

TL;DR

Abstract

Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)