Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach
Xinwei Zhang, Zhiqi Bu, Zhiwei Steven Wu, Mingyi Hong
TL;DR
DiceSGD tackles the clipping bias problem in differentially private SGD by incorporating a clipped error-feedback mechanism that debiases the gradient updates while preserving $(\epsilon,\delta)$-DP via Gaussian noise. The method provides convergence guarantees for non-convex, Lipschitz-smooth objectives using a tailored Rényi-DP analysis and yields a utility bound of $\mathcal{O}(1/\sqrt{T})$, with DP noise that is slightly larger due to the non-privatized error state. The paper proves that clipping thresholds can be chosen independently of problem constants, circumventing the traditional clipping-tuning issue, and demonstrates superior empirical performance over DPSGD-GC on CIFAR-10/100 and E2E GPT-2 tasks. Overall, DiceSGD offers a practical, principled approach to private training that preserves performance while maintaining strong privacy guarantees.
Abstract
Differentially Private Stochastic Gradient Descent with Gradient Clipping (DPSGD-GC) is a powerful tool for training deep learning models using sensitive data, providing both a solid theoretical privacy guarantee and high efficiency. However, using DPSGD-GC to ensure Differential Privacy (DP) comes at the cost of model performance degradation due to DP noise injection and gradient clipping. Existing research has extensively analyzed the theoretical convergence of DPSGD-GC, and has shown that it only converges when using large clipping thresholds that are dependent on problem-specific parameters. Unfortunately, these parameters are often unknown in practice, making it hard to choose the optimal clipping threshold. Therefore, in practice, DPSGD-GC suffers from degraded performance due to the {\it constant} bias introduced by the clipping. In our work, we propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC, which not only offers a diminishing utility bound without inducing a constant clipping bias, but more importantly, it allows for an arbitrary choice of clipping threshold that is independent of the problem. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R{é}nyi DP. Additionally, we demonstrate that under mild conditions, our algorithm can achieve nearly the same utility bound as DPSGD without gradient clipping. Our empirical results on Cifar-10/100 and E2E datasets, show that the proposed algorithm achieves higher accuracies than DPSGD while maintaining the same level of DP guarantee.
