Table of Contents
Fetching ...

DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction

Xinwei Zhang, Zhiqi Bu, Mingyi Hong, Meisam Razaviyayn

TL;DR

The paper tackles the challenge that DP training introduces noise that degrades the performance of large-scale models. It introduces DOPPLER, a low-pass filter applied to privatized gradients, grounded in a frequency-domain analysis that treats gradients as low-frequency signals and DP noise as high-frequency noise. The authors provide convergence and privacy guarantees for DPSGD with DOPPLER, and demonstrate substantial empirical gains (3-10% in test accuracy) across diverse models and datasets, indicating a meaningful reduction in the DP-utility gap. The approach is compatible with existing DP optimizers and offers a practical, theoretically justified method to enhance DP training for foundation-model-scale tasks.

Abstract

Privacy is a growing concern in modern deep-learning systems and applications. Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from the trained machine learning models. DP optimizers, including DP stochastic gradient descent (DPSGD) and its variants, privatize the training procedure by gradient clipping and DP noise injection. However, in practice, DP models trained using DPSGD and its variants often suffer from significant model performance degradation. Such degradation prevents the application of DP optimization in many key tasks, such as foundation model pretraining. In this paper, we provide a novel signal processing perspective to the design and analysis of DP optimizers. We show that a ``frequency domain'' operation called low-pass filtering can be used to effectively reduce the impact of DP noise. More specifically, by defining the ``frequency domain'' for both the gradient and differential privacy (DP) noise, we have developed a new component, called DOPPLER. This component is designed for DP algorithms and works by effectively amplifying the gradient while suppressing DP noise within this frequency domain. As a result, it maintains privacy guarantees and enhances the quality of the DP-protected model. Our experiments show that the proposed DP optimizers with a low-pass filter outperform their counterparts without the filter by 3%-10% in test accuracy on various models and datasets. Both theoretical and practical evidence suggest that the DOPPLER is effective in closing the gap between DP and non-DP training.

DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction

TL;DR

The paper tackles the challenge that DP training introduces noise that degrades the performance of large-scale models. It introduces DOPPLER, a low-pass filter applied to privatized gradients, grounded in a frequency-domain analysis that treats gradients as low-frequency signals and DP noise as high-frequency noise. The authors provide convergence and privacy guarantees for DPSGD with DOPPLER, and demonstrate substantial empirical gains (3-10% in test accuracy) across diverse models and datasets, indicating a meaningful reduction in the DP-utility gap. The approach is compatible with existing DP optimizers and offers a practical, theoretically justified method to enhance DP training for foundation-model-scale tasks.

Abstract

Privacy is a growing concern in modern deep-learning systems and applications. Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from the trained machine learning models. DP optimizers, including DP stochastic gradient descent (DPSGD) and its variants, privatize the training procedure by gradient clipping and DP noise injection. However, in practice, DP models trained using DPSGD and its variants often suffer from significant model performance degradation. Such degradation prevents the application of DP optimization in many key tasks, such as foundation model pretraining. In this paper, we provide a novel signal processing perspective to the design and analysis of DP optimizers. We show that a ``frequency domain'' operation called low-pass filtering can be used to effectively reduce the impact of DP noise. More specifically, by defining the ``frequency domain'' for both the gradient and differential privacy (DP) noise, we have developed a new component, called DOPPLER. This component is designed for DP algorithms and works by effectively amplifying the gradient while suppressing DP noise within this frequency domain. As a result, it maintains privacy guarantees and enhances the quality of the DP-protected model. Our experiments show that the proposed DP optimizers with a low-pass filter outperform their counterparts without the filter by 3%-10% in test accuracy on various models and datasets. Both theoretical and practical evidence suggest that the DOPPLER is effective in closing the gap between DP and non-DP training.
Paper Structure (27 sections, 3 theorems, 18 equations, 13 figures, 4 tables, 3 algorithms)

This paper contains 27 sections, 3 theorems, 18 equations, 13 figures, 4 tables, 3 algorithms.

Key Result

Theorem 1

Given $N,B,T$ and $C$, there exist positive constants $u,v$, such that for any $\epsilon<\frac{uB^2T}{N^2}, \delta>0$, by choosing $\sigma_\textup{DP}^2\geq v\frac{C^2T\ln(\frac{1}{\delta})}{N^2\epsilon^2}$, Algorithm alg:dpsgd is guaranteed to be $(\epsilon,\delta)$-DP.

Figures (13)

  • Figure 1: An illustration of the auto-correlation $\phi(\tau)$ and power spectrum density $P(\nu)$ of $\{\nabla F(\mathbf{x}^t)\}$ and $\mathbf{w}^t$ where $\phi_{\nabla f}$ decays proportional to $\tau^2$ and $\mathbf{w}_t$ is a white noise. (c) illustrates how an ideal low-pass filters out the high-frequency noise and keeps the low-frequency signal.
  • Figure 2: The recorded PSD of Gaussian noise $\{\mathbf{w}_t\}$, and the stochastic gradients of SGD and LP-SGD of ResNet-50 training on Cifar-10 dataset.
  • Figure 3: Comparision between DPSGD and LP-DPSGD for pre-training on different datasets.
  • Figure 4: Comparision between DP optimizers w and w/o low-pass filters for pre-training with different $\epsilon$'s on Cifar-10 dataset.
  • Figure 5: Illustration of the low-pass filter.
  • ...and 8 more figures

Theorems & Definitions (5)

  • Definition 1: $(\epsilon,\delta)$-DP dwork2014algorithmic
  • Definition 2: Gaussian Mechanism zhao2019reviewing
  • Theorem 1: Privacy Guarantee abadi2016deep
  • Theorem 2: Convergence
  • Theorem 3: Privacy-utility trade-off