Table of Contents
Fetching ...

STAMP: Selective Task-Aware Mechanism for Text Privacy

Fengwei Tian, Payel Bhattacharjee, Heidi Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon

Abstract

We present STAMP (Selective Task-Aware Mechanism for Text Privacy), a new framework for task-aware text privatization that achieves an improved privacy-utility trade-off. STAMP selectively allocates privacy budgets across tokens by jointly considering (i) each token's importance to the downstream task (as measured via a task- or query-specific representation), and (ii) its privacy sensitivity (e.g., names, dates, identifiers). This token-level partitioning enables fine-grained, group-wise control over the level of noise applied to different parts of the input, balancing privacy protection with task relevance. To privatize individual token embeddings, we introduce the polar mechanism, which perturbs only the direction of embeddings on the unit sphere while preserving their magnitude. Decoding is performed via cosine nearest-neighbor search, aligning the perturbation geometry with the decoding geometry. Unlike isotropic noise mechanisms, the polar mechanism maintains semantic neighborhoods in the embedding space and better preserves downstream utility. Experimental evaluations on SQuAD, Yelp, and AG News datasets demonstrate that STAMP, when combined with the normalized polar mechanism, consistently achieves superior privacy-utility trade-offs across varying per-token privacy budgets.

STAMP: Selective Task-Aware Mechanism for Text Privacy

Abstract

We present STAMP (Selective Task-Aware Mechanism for Text Privacy), a new framework for task-aware text privatization that achieves an improved privacy-utility trade-off. STAMP selectively allocates privacy budgets across tokens by jointly considering (i) each token's importance to the downstream task (as measured via a task- or query-specific representation), and (ii) its privacy sensitivity (e.g., names, dates, identifiers). This token-level partitioning enables fine-grained, group-wise control over the level of noise applied to different parts of the input, balancing privacy protection with task relevance. To privatize individual token embeddings, we introduce the polar mechanism, which perturbs only the direction of embeddings on the unit sphere while preserving their magnitude. Decoding is performed via cosine nearest-neighbor search, aligning the perturbation geometry with the decoding geometry. Unlike isotropic noise mechanisms, the polar mechanism maintains semantic neighborhoods in the embedding space and better preserves downstream utility. Experimental evaluations on SQuAD, Yelp, and AG News datasets demonstrate that STAMP, when combined with the normalized polar mechanism, consistently achieves superior privacy-utility trade-offs across varying per-token privacy budgets.
Paper Structure (38 sections, 4 theorems, 19 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 38 sections, 4 theorems, 19 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Fix a task $T$ and assume that $T$ and its associated grouping map $g_T$ are public. Then Stamp satisfies task-aware metric LDP with budget vector $\boldsymbol{\epsilon}^T = (\epsilon_T^{(1)}, \epsilon_T^{(2)}, \epsilon_T^{(3)}, \epsilon_T^{(4)})$.

Figures (10)

  • Figure 1: Overview of the STAMP framework: Tokens are categorized according to their task and privacy relevance, and then perturbed using adaptively assigned privacy budgets via the polar mechanism. Figure (a) (left) illustrates the overall token perturbation pipeline with group-wise privacy budget allocation; (b) (right) details the grouping and budget assignment process based on task and privacy importance.
  • Figure 2: Prompt-based question answering example (in the baseline context, red bold text indicates privacy-sensitive tokens; blue bold text denotes task-relevant text): Uniform perturbation treats all tokens equally, wasting privacy on uninformative words and under-protecting sensitive but predictive ones. STAMP addresses this by stratifying tokens by privacy sensitivity and task relevance, and by aligning the privatization geometry with the decoding geometry.
  • Figure 3: STAMP privatization pipeline. Embeddings are decomposed into radial and angular components, perturbed under metric LDP, and decoded by angular proximity into privatized tokens suitable for downstream tasks.
  • Figure 4: Directional privatization using the Polar mechanism, which perturbs the angle of the normalized embedding vector while normalizing its magnitude.
  • Figure 5: The parameter $\tau$ serves as the critical decision boundary that classifies tokens as task-relevant or irrelevant based on their cosine similarity to the task representation. A sweep of $\tau$ reveals that downstream utility (measured by cosine similarity between original and privatized contexts) stabilizes around $\tau=0.5$, indicating a robust operating point that preserves task-critical signals without over-perturbing the context.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Definition 1: $(\epsilon,\delta)$-LDP (token level)
  • Definition 2: $(\epsilon,\delta)$-metric LDP (token level)
  • Definition 3: Task-Aware Metric LDP
  • Theorem 1: Privacy Guarantee of Stamp
  • Theorem 2: Context-Level Privacy Guarantee of Stamp
  • Theorem 3: vMF mechanism satisfies metric LDP
  • Proposition 1: Equivalence of Semantic Decoding and Nearest-Neighbor Search on the Sphere
  • proof
  • proof
  • proof
  • ...and 1 more