Table of Contents
Fetching ...

Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks

Yu-Che Tsai, Hsiang Hsiao, Kuan-Yu Chen, Shou-De Lin

TL;DR

This paper tackles embedding inversion risks in text representations by moving beyond uniform noise injections to concept-aware privacy protection. It introduces SPARSE, a two-part framework that first learns a differentiable neuron-mask to identify privacy-sensitive embedding dimensions tied to user-defined concepts, and then perturbs those dimensions with a Mahalanobis mechanism that injects elliptical noise calibrated to dimension sensitivity. The approach yields strong privacy protection with preserved downstream utility across six datasets, three embedding models, and multiple attack models, outperforming state-of-the-art local DP baselines and approaching a white-box upper bound. The work demonstrates the existence of privacy neurons and shows that targeted, sensitivity-guided perturbation can significantly reduce leakage while maintaining semantic integrity, offering a practical path toward controllable, concept-aware privacy in NLP embeddings.

Abstract

Text embeddings enable numerous NLP applications but face severe privacy risks from embedding inversion attacks, which can expose sensitive attributes or reconstruct raw text. Existing differential privacy defenses assume uniform sensitivity across embedding dimensions, leading to excessive noise and degraded utility. We propose SPARSE, a user-centric framework for concept-specific privacy protection in text embeddings. SPARSE combines (1) differentiable mask learning to identify privacy-sensitive dimensions for user-defined concepts, and (2) the Mahalanobis mechanism that applies elliptical noise calibrated by dimension sensitivity. Unlike traditional spherical noise injection, SPARSE selectively perturbs privacy-sensitive dimensions while preserving non-sensitive semantics. Evaluated across six datasets with three embedding models and attack scenarios, SPARSE consistently reduces privacy leakage while achieving superior downstream performance compared to state-of-the-art DP methods.

Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks

TL;DR

This paper tackles embedding inversion risks in text representations by moving beyond uniform noise injections to concept-aware privacy protection. It introduces SPARSE, a two-part framework that first learns a differentiable neuron-mask to identify privacy-sensitive embedding dimensions tied to user-defined concepts, and then perturbs those dimensions with a Mahalanobis mechanism that injects elliptical noise calibrated to dimension sensitivity. The approach yields strong privacy protection with preserved downstream utility across six datasets, three embedding models, and multiple attack models, outperforming state-of-the-art local DP baselines and approaching a white-box upper bound. The work demonstrates the existence of privacy neurons and shows that targeted, sensitivity-guided perturbation can significantly reduce leakage while maintaining semantic integrity, offering a practical path toward controllable, concept-aware privacy in NLP embeddings.

Abstract

Text embeddings enable numerous NLP applications but face severe privacy risks from embedding inversion attacks, which can expose sensitive attributes or reconstruct raw text. Existing differential privacy defenses assume uniform sensitivity across embedding dimensions, leading to excessive noise and degraded utility. We propose SPARSE, a user-centric framework for concept-specific privacy protection in text embeddings. SPARSE combines (1) differentiable mask learning to identify privacy-sensitive dimensions for user-defined concepts, and (2) the Mahalanobis mechanism that applies elliptical noise calibrated by dimension sensitivity. Unlike traditional spherical noise injection, SPARSE selectively perturbs privacy-sensitive dimensions while preserving non-sensitive semantics. Evaluated across six datasets with three embedding models and attack scenarios, SPARSE consistently reduces privacy leakage while achieving superior downstream performance compared to state-of-the-art DP methods.
Paper Structure (45 sections, 4 theorems, 26 equations, 3 figures, 12 tables, 2 algorithms)

This paper contains 45 sections, 4 theorems, 26 equations, 3 figures, 12 tables, 2 algorithms.

Key Result

Theorem 1

Given a privacy parameter $\epsilon$, the Mahalanobis mechanism outputting $\Phi'(\mathbf{s}) \sim \mathcal{M}\left(\Phi\left(\mathbf{s}\right)\right)$ fulfills $\epsilon d$-LDP with respect to the Mahalanobis Norm.

Figures (3)

  • Figure 1: Illustration of embedding inversion attack and different defense strategies. (a) Sensitive information can be easily identified from non-protected text embeddings. (b) Adding spherical noise mitigates privacy leakage but harms textual semantics. (c) Our approach applies elliptical noise guided by a user-defined privacy concept, selectively adding stronger perturbations to privacy-sensitive dimensions while preserving non-sensitive semantics. A real-world case study is presented in Appendix \ref{['appendix:case-study']}.
  • Figure 2: Visualization of the learned neuron mask by SPARSE for individual tokens, where larger values represent higher privacy sensitivity.
  • Figure 3: Sensitivity distribution comparison between the top and bottom 10% privacy neurons. The Wilcoxon Signed Rank Test indicates a highly significant difference (p-value $= 1.30 \times 10^{-21}$).

Theorems & Definitions (15)

  • Definition 1: Local Differential Privacy
  • Definition 2: Metric Local Differential Privacy
  • Definition 3: Generalized Laplace Mechanism wu2017bolt
  • Definition 4: Mahalanobis Norm
  • Definition 5: Mahalanobis Mechanism
  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Definition 6: Privacy Neurons
  • Definition 7: Neuron Sensitivity
  • ...and 5 more