Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

Ruichen Xu; Kexin Chen

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

Ruichen Xu, Kexin Chen

TL;DR

A unified feature-centric framework to analyze the feature learning dynamics of differentially private stochastic gradient descent (DP-SGD) in two-layer ReLU convolutional neural networks reveals that the popular paradigm of public pre-training and private fine-tuning does not guarantee improvement, particularly under significant feature distribution shifts between datasets.

Abstract

Differentially private learning is essential for training models on sensitive data, but empirical studies consistently show that it can degrade performance, introduce fairness issues like disparate impact, and reduce adversarial robustness. The theoretical underpinnings of these phenomena in modern, non-convex neural networks remain largely unexplored. This paper introduces a unified feature-centric framework to analyze the feature learning dynamics of differentially private stochastic gradient descent (DP-SGD) in two-layer ReLU convolutional neural networks. Our analysis establishes test loss bounds governed by a crucial metric: the feature-to-noise ratio (FNR). We demonstrate that the noise required for privacy leads to suboptimal feature learning, and specifically show that: 1) imbalanced FNRs across classes and subpopulations cause disparate impact; 2) even in the same class, noise has a greater negative impact on semantically long-tailed data; and 3) noise injection exacerbates vulnerability to adversarial attacks. Furthermore, our analysis reveals that the popular paradigm of public pre-training and private fine-tuning does not guarantee improvement, particularly under significant feature distribution shifts between datasets. Experiments on synthetic and real-world data corroborate our theoretical findings.

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

TL;DR

Abstract

Paper Structure (47 sections, 25 theorems, 88 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 47 sections, 25 theorems, 88 equations, 4 figures, 5 tables, 2 algorithms.

Introduction
Related work
Notation
Model
Test loss analysis
Preliminary
Standard test loss analysis
Adversarial test loss analysis
Understanding DP-SGD impacts
Interpretation of disparate impact
Interpretation of adversarial robustness
Public-pretraining and private-finetuning
Solutions for improving feature-to-noise ratio
Experiments
Synthetic datasets
...and 32 more sections

Key Result

Theorem 3.4

Under Condition condition and Assumption assumption: non-perfect, with a probability at least $1-\delta$, for any $i\in \{1,2\}, j\in \{\text{maj},\text{min}\}$, the test loss of a dpsgd trained model satisfies: $\mathcal{L}_{\mathcal{D}_{i,j}}\!\left(\!\mathbf{W}^{(T)}\!\right) \le \bar{L}_{i,j}

Figures (4)

Figure 1: Illustration of the privacy-utility phase transition between benign and harmful privacy protection.
Figure 2: Model standard test loss and adversarial test loss.
Figure 3: Visualization of correctly classified (left) and misclassified (right) images.
Figure 4: Examples on using image padding to control feature sizes. For digits and objects, we pad the images with their background color.

Theorems & Definitions (39)

Remark 2.1
Definition 2.2: $(\epsilon, \alpha)$-Differential privacy
Definition 3.3
Theorem 3.4
Theorem 3.5
Definition 3.6: Adversarial example
Theorem 3.7
Corollary 4.1: Disparate impact of different classes
Corollary 4.2: Disparate impact of subpopulation groups
Proposition 4.4
...and 29 more

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

TL;DR

Abstract

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (39)