Table of Contents
Fetching ...

Empirical Calibration and Metric Differential Privacy in Language Models

Pedro Faustini, Natasha Fernandes, Annabelle McIver, Mark Dras

TL;DR

The paper tackles empirical privacy calibration for NLP under differential privacy, showing MIAs are unreliable indicators while gradient-based reconstruction attacks provide clearer leakage signals as the privacy budget $\epsilon$ varies. It introduces metric DP with a directional VMF mechanism (DirDP-SGD) that perturbs gradient directions using the VMF distribution and compares it to standard isotropic Gaussian DP-SGD. Using GPT-2 and BERT on IMDb, SST2, and CoLA, it demonstrates that VMF can yield competitive utility and sometimes superior protection for short texts. The study highlights task- and model-dependent privacy-utility trade-offs and motivates broader adoption of directional privacy and gradient-based diagnostics in NLP privacy research.

Abstract

NLP models trained with differential privacy (DP) usually adopt the DP-SGD framework, and privacy guarantees are often reported in terms of the privacy budget $ε$. However, $ε$ does not have any intrinsic meaning, and it is generally not possible to compare across variants of the framework. Work in image processing has therefore explored how to empirically calibrate noise across frameworks using Membership Inference Attacks (MIAs). However, this kind of calibration has not been established for NLP. In this paper, we show that MIAs offer little help in calibrating privacy, whereas reconstruction attacks are more useful. As a use case, we define a novel kind of directional privacy based on the von Mises-Fisher (VMF) distribution, a metric DP mechanism that perturbs angular distance rather than adding (isotropic) Gaussian noise, and apply this to NLP architectures. We show that, even though formal guarantees are incomparable, empirical privacy calibration reveals that each mechanism has different areas of strength with respect to utility-privacy trade-offs.

Empirical Calibration and Metric Differential Privacy in Language Models

TL;DR

The paper tackles empirical privacy calibration for NLP under differential privacy, showing MIAs are unreliable indicators while gradient-based reconstruction attacks provide clearer leakage signals as the privacy budget varies. It introduces metric DP with a directional VMF mechanism (DirDP-SGD) that perturbs gradient directions using the VMF distribution and compares it to standard isotropic Gaussian DP-SGD. Using GPT-2 and BERT on IMDb, SST2, and CoLA, it demonstrates that VMF can yield competitive utility and sometimes superior protection for short texts. The study highlights task- and model-dependent privacy-utility trade-offs and motivates broader adoption of directional privacy and gradient-based diagnostics in NLP privacy research.

Abstract

NLP models trained with differential privacy (DP) usually adopt the DP-SGD framework, and privacy guarantees are often reported in terms of the privacy budget . However, does not have any intrinsic meaning, and it is generally not possible to compare across variants of the framework. Work in image processing has therefore explored how to empirically calibrate noise across frameworks using Membership Inference Attacks (MIAs). However, this kind of calibration has not been established for NLP. In this paper, we show that MIAs offer little help in calibrating privacy, whereas reconstruction attacks are more useful. As a use case, we define a novel kind of directional privacy based on the von Mises-Fisher (VMF) distribution, a metric DP mechanism that perturbs angular distance rather than adding (isotropic) Gaussian noise, and apply this to NLP architectures. We show that, even though formal guarantees are incomparable, empirical privacy calibration reveals that each mechanism has different areas of strength with respect to utility-privacy trade-offs.

Paper Structure

This paper contains 35 sections, 5 theorems, 7 equations, 6 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Let $\epsilon > 0$ and denote by $\mathbb{S}^{K-1}$ the unit sphere in $K$ dimensions. Then the VMF mechanism on $\mathbb{S}^{K-1}$ satisfies $\epsilon d_2$-privacy where $d_2$ is the Euclidean metric. That is, for all $x, x' \in \mathbb{S}^{K-1}$ and all (measurable) $Y \subseteq \mathbb{S}^{K-1}$.

Figures (6)

  • Figure 1: DBLP:conf/uss/Jayaraman019's Fig 2(a): calibration plotting $\epsilon$ against privacy leakage for several DP-SGD variants.
  • Figure 2: Utility (accuracy or MCC) for models under different privacy settings across each dataset.
  • Figure 3: Illustration highlighting how gradients are perturbed with Gaussian noise versus VMF noise. The dark red arrow is the unperturbed gradient, and lighter red arrows are perturbations of angular distance $A$ that are more likely when close to the unperturbed gradient, as under the VMF mechanism. The blue arrows represent isotropic noise added to the gradient by DP-SGD.
  • Figure 4: Tradeoffs between privacy (ROUGE-L) and utility (accuracy or MCC), across the models and datasets under different privacy settings for the Decepticons attack. Top left is best.
  • Figure 5: LAMP attack: reconstruction (ROUGE-L) against utility (accuracy or MCC) for Gaussian (blue dots) and VMF (red dots) noises. Top left is best.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Theorem 1: weggenmann-kerschbaum:2021:CCS
  • Corollary 1: dwork-roth:2014
  • Theorem 2
  • Corollary 2
  • Corollary 3