DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training

Zihao Wang; Rui Zhu; Dongruo Zhou; Zhikun Zhang; John Mitchell; Haixu Tang; XiaoFeng Wang

DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training

Zihao Wang, Rui Zhu, Dongruo Zhou, Zhikun Zhang, John Mitchell, Haixu Tang, XiaoFeng Wang

TL;DR

DPAdapter is introduced, a pioneering technique designed to amplify the model performance of DPML algorithms by enhancing parameter robustness by modifying and enhances the sharpness-aware minimization technique, utilizing a two-batch strategy to provide a more accurate perturbation estimate and an efficient gradient descent.

Abstract

Recent developments have underscored the critical role of \textit{differential privacy} (DP) in safeguarding individual data for training machine learning models. However, integrating DP oftentimes incurs significant model performance degradation due to the perturbation introduced into the training process, presenting a formidable challenge in the {differentially private machine learning} (DPML) field. To this end, several mitigative efforts have been proposed, typically revolving around formulating new DPML algorithms or relaxing DP definitions to harmonize with distinct contexts. In spite of these initiatives, the diminishment induced by DP on models, particularly large-scale models, remains substantial and thus, necessitates an innovative solution that adeptly circumnavigates the consequential impairment of model utility. In response, we introduce DPAdapter, a pioneering technique designed to amplify the model performance of DPML algorithms by enhancing parameter robustness. The fundamental intuition behind this strategy is that models with robust parameters are inherently more resistant to the noise introduced by DP, thereby retaining better performance despite the perturbations. DPAdapter modifies and enhances the sharpness-aware minimization (SAM) technique, utilizing a two-batch strategy to provide a more accurate perturbation estimate and an efficient gradient descent, thereby improving parameter robustness against noise. Notably, DPAdapter can act as a plug-and-play component and be combined with existing DPML algorithms to further improve their performance. Our experiments show that DPAdapter vastly enhances state-of-the-art DPML algorithms, increasing average accuracy from 72.92\% to 77.09\% with a privacy budget of $ε=4$.

DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training

TL;DR

Abstract

Paper Structure (26 sections, 5 theorems, 23 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 26 sections, 5 theorems, 23 equations, 7 figures, 1 table, 2 algorithms.

Introduction
Background
Transfer Learning
Differentially Private Machine Learning
Adversarial Robustness and Model Parameter Robustness
Problem Formulation and Key Observations
Problem Formulation
Motivation and Key Observations
DPAdapter
Design Challenge
Design Intuition
Methodology Overview
Design Details
Theoretical Analyses
Evaluation
...and 11 more sections

Key Result

Theorem 1

With proper selection of parameters, our algorithm enjoys the following gradient norm bound: where $\hat{\beta} = (\rho^2\beta_2 + \beta\beta_1)$.

Figures (7)

Figure 1: Impact of perturbation magnitude on AMP.
Figure 2: The relationship between the parameter robustness of the pretrained models and that of the models fine-tuned on the downstream tasks.
Figure 3: Impact of different batch sizes on AMP.
Figure 4: Overview of the DPAdapter approach. In the first step, known as standard training, we use the training data to conduct standard training on the model. This is analogous to the preheating process common in typical Adversarial Training (AT), where the model's accuracy is first brought up to a standard level, resulting in the model $f_{\theta}$. Next, in step 2, We selected a batch data $\mathcal{B}_1$ (big batch size) from the training data, and utilize it to compute the worst-case model perturbation by \ref{['eq:perturbation']} (①), producing the model $f_{\theta+\Delta}$. In step 3, we use another batch of data $\mathcal{B}_2$ to do sgd on the model $f_{\theta+\Delta}$ (②), resulting in the model $f_{\mathbf{w}+\Delta+\alpha}$. In step 4, we reverse the perturbation made in step 2 within the model $f_{\theta+\Delta+\alpha}$ (③), obtaining the model $f_{\theta+\alpha}$. Steps 2 to 4 are performed iteratively (④), and upon reaching the desired number of iterations, we return the target model (⑤). This culminating model is tailored to facilitate subsequent fine-tuning processes, enabling models built upon it to more efficaciously deploy the DPML optimization algorithm(⑥). This ensures not only the protection of the privacy of the fine-tuned dataset but also a notable enhancement in DPML performance.
Figure 5: Impact of perturbation magnitude on DPAdapter.
...and 2 more figures

Theorems & Definitions (9)

Definition 2.1: ($\epsilon$, $\delta$)-DP
Theorem 1: Informal
Theorem 2: Informal
Theorem 3: Formal version of \ref{['thm:1']}
Lemma A.1
proof
proof : Proof of Theorem \ref{['thm:11']}
Theorem 4: Formal version of \ref{['thm:2']}
proof : Proof of \ref{['thm:22']}

DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training

TL;DR

Abstract

DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (9)