Table of Contents
Fetching ...

Spurious Privacy Leakage in Neural Networks

Chenxiang Zhang, Jun Pang, Sjouke Mauw

TL;DR

This work addresses privacy risks arising from spurious correlations in real-world data by introducing spurious privacy leakage and a corresponding group privacy disparity under membership inference attacks. It applies LiRA-based MIAs to five real-world spurious datasets, evaluates spurious robust methods (DRO, DFR) and differential privacy, and analyzes multiple architecture families to understand privacy dynamics. The key findings show consistent subgroup privacy disparities that are not reliably mitigated by current robust training techniques, and that while differential privacy can improve worst-group protection, it often harms utility; architecture and pretraining also influence privacy auditing. These results underscore the need for fine-grained, group-level privacy auditing in biased data settings and point to directions for improving defenses and auditing practices in practical deployments.

Abstract

Neural networks trained on real-world data often exhibit biases while simultaneously being vulnerable to privacy attacks aimed at extracting sensitive information. Despite extensive research on each problem individually, their intersection remains poorly understood. In this work, we investigate the privacy impact of spurious correlation bias. We introduce \emph{spurious privacy leakage}, a phenomenon in which spurious groups are significantly more vulnerable to privacy attacks than non-spurious groups. We observe that privacy disparity between groups increases in tasks with simpler objectives (e.g. fewer classes) due to spurious features. Counterintuitively, we demonstrate that spurious robust methods, designed to reduce spurious bias, fail to mitigate privacy disparity. Our analysis reveals that this occurs because robust methods can reduce reliance on spurious features for prediction, but do not prevent their memorization during training. Finally, we systematically compare the privacy of different model architectures trained with spurious data, demonstrating that, contrary to previous work, architectural choice can affect privacy evaluation.

Spurious Privacy Leakage in Neural Networks

TL;DR

This work addresses privacy risks arising from spurious correlations in real-world data by introducing spurious privacy leakage and a corresponding group privacy disparity under membership inference attacks. It applies LiRA-based MIAs to five real-world spurious datasets, evaluates spurious robust methods (DRO, DFR) and differential privacy, and analyzes multiple architecture families to understand privacy dynamics. The key findings show consistent subgroup privacy disparities that are not reliably mitigated by current robust training techniques, and that while differential privacy can improve worst-group protection, it often harms utility; architecture and pretraining also influence privacy auditing. These results underscore the need for fine-grained, group-level privacy auditing in biased data settings and point to directions for improving defenses and auditing practices in practical deployments.

Abstract

Neural networks trained on real-world data often exhibit biases while simultaneously being vulnerable to privacy attacks aimed at extracting sensitive information. Despite extensive research on each problem individually, their intersection remains poorly understood. In this work, we investigate the privacy impact of spurious correlation bias. We introduce \emph{spurious privacy leakage}, a phenomenon in which spurious groups are significantly more vulnerable to privacy attacks than non-spurious groups. We observe that privacy disparity between groups increases in tasks with simpler objectives (e.g. fewer classes) due to spurious features. Counterintuitively, we demonstrate that spurious robust methods, designed to reduce spurious bias, fail to mitigate privacy disparity. Our analysis reveals that this occurs because robust methods can reduce reliance on spurious features for prediction, but do not prevent their memorization during training. Finally, we systematically compare the privacy of different model architectures trained with spurious data, demonstrating that, contrary to previous work, architectural choice can affect privacy evaluation.

Paper Structure

This paper contains 20 sections, 6 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: Attack success rate divided per group on Waterbirds, CelebA, MultiNLI, and CivilComments respectively. Across the datasets, there is a spurious group (solid lines) with consistent higher privacy leakage compared to non-spurious groups under the LiRA attack.
  • Figure 1: Comparing the privacy leakage of spurious robust methods per group. Although these methods improve the worst-group accuracy, DRO and DFR do not consistently mitigate the attack across datasets. *Waterbirds is evaluated at ${\approx}3\%$ FPR due to the limited samples in the spurious groups (see \ref{['tab:tpr_full']} for the complete results). Bolded values represent the best training method for privacy mitigation. The spurious groups are highlighted.
  • Figure 2: (a) Group privacy disparity increases as the target complexity reduces from FMoW62 to FMoW4. The solid line, representing the spurious group 2, remains constant while the other groups become less vulnerable. (b) Feature similarity of each group between FMoW62 and FMoW4 using linear CKA. The most similar group is the spurious group 2 colored in blue. (c) Feature complexity using explainable variance for embeddings of models trained on FMoW62 and FMoW4. FMoW4 requires only 3 principal components compared to 25 of FMoW62 to explain $\approx95\%$ of the variance. Additionally, spurious groups need fewer components than non-spurious groups.
  • Figure 2: Target model architecture accuracy on Waterbirds. Modern architectures are better at mitigating spurious correlation compared to older ones. However, the best performing convolutional and transformer based models show no significant difference in worst-group accuracy.
  • Figure 3: Memorization score per group for each spurious robust method on Waterbirds. Neither DRO nor DFR training can effectively mitigate data memorization.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Definition B.1: Feature Similarity
  • Definition B.2: Feature Complexity
  • Definition B.3: Label memorization
  • Definition D.1: Differential privacy