Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

Chendi Wang; Yuqing Zhu; Weijie J. Su; Yu-Xiang Wang

Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

Chendi Wang, Yuqing Zhu, Weijie J. Su, Yu-Xiang Wang

TL;DR

It is revealed that DP fine-tuning is less robust compared to fine-tuning without DP, particularly in the presence of perturbations, and several strategies, such as feature normalization or employing dimension reduction methods like Principal Component Analysis (PCA), are suggested to enhance the robustness of DP fine-tuning.

Abstract

A recent study by De et al. (2022) has reported that large-scale representation learning through pre-training on a public dataset significantly enhances differentially private (DP) learning in downstream tasks, despite the high dimensionality of the feature space. To theoretically explain this phenomenon, we consider the setting of a layer-peeled model in representation learning, which results in interesting phenomena related to learned features in deep learning and transfer learning, known as Neural Collapse (NC). Within the framework of NC, we establish an error bound indicating that the misclassification error is independent of dimension when the distance between actual features and the ideal ones is smaller than a threshold. Additionally, the quality of the features in the last layer is empirically evaluated under different pre-trained models within the framework of NC, showing that a more powerful transformer leads to a better feature representation. Furthermore, we reveal that DP fine-tuning is less robust compared to fine-tuning without DP, particularly in the presence of perturbations. These observations are supported by both theoretical analyses and experimental evaluation. Moreover, to enhance the robustness of DP fine-tuning, we suggest several strategies, such as feature normalization or employing dimension reduction methods like Principal Component Analysis (PCA). Empirically, we demonstrate a significant improvement in testing accuracy by conducting PCA on the last-layer features.

Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

TL;DR

Abstract

Paper Structure (66 sections, 8 theorems, 108 equations, 4 figures, 1 table)

This paper contains 66 sections, 8 theorems, 108 equations, 4 figures, 1 table.

Introduction
Contributions of this paper
Preliminaries and Problem Setup
Symbols and notations.
Differentially private learning.
Noisy gradient descent.
Private fine-tuning.
Theoretical setup for private fine-tuning.
Misclassification error.
Beyond the distribution-free theory.
Neural Collapse.
Bounds on misclassification errors and robustness in private fine-tuning
Bounds on misclassification errors
Stochastic shift vectors
Remark.
...and 51 more sections

Key Result

Theorem 3.1

Let $\widehat{\theta}_{\mathrm{GD}}$ be a predictor trained by GD under the cross entropy loss with zero initialization. Then, we have the following error bound on the misclassification error.

Figures (4)

Figure 1: The figure depicts the evolution of the feature layer outputs of a VGG-13 neural network when trained on CIFAR-10 with three randomly selected classes. Each class is represented by a distinct color in the small blue sphere. As the training progresses, the last-layer feature means collapse onto their corresponding classes. Credit to MR4250189.
Figure 2: CIFAR-10. A figure depicting the feature shift parameter $\beta$ when fine-tuning different pre-trained models on CIFAR-10. As observed, ViT performs better than ResNet-50, as the shift parameter is much smaller. The feature shift vectors are quite stochastic.
Figure 3: Empirical behaviors of NoisyGD under various robustness setting.
Figure 4: CIFAR-10. Apply PCA on both training and testing features before NoisySGD: setting $K-1$ principal components improves NoisySGD's robustness.

Theorems & Definitions (12)

Definition 2.1: Zero-Concentrated Differential Privacy, zCDP, DBLP:conf/tcc/BunS16
Definition 2.2: Sample complexity for private $\mathfrak{D}$-learnability
Definition 2.3: Feature shift parameter $\beta$
Theorem 3.1: misclassification error for GD
Theorem 3.2: misclassification error for NoisyGD
Theorem 3.3: misclassification error for NoisyGD
Theorem 3.4: Multiple iterations
Theorem 4.1
Theorem D.1
Definition D.2: Classification problem under Neural Collapse
...and 2 more

Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

TL;DR

Abstract

Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (12)