Training with Differential Privacy: A Gradient-Preserving Noise Reduction Approach with Provable Security
Haodi Wang, Tangyu Jiang, Yu Guo, Chengjun Cai, Cong Wang, Xiaohua Jia
TL;DR
GReDP introduces a gradient-preserving DP training method that computes model gradients in the frequency domain via FFT, adds Gaussian noise, and recovers gradients by inverse transforming and retaining only the real part. Theoretical analysis shows the method achieves a noise scale of half that required by DPSGD while preserving the full gradient information, addressing a key utility gap in DP-based DL. Empirical results across MNIST and CIFAR-10 with multiple architectures demonstrate consistent, substantial accuracy gains over DPSGD and Spectral-DP under the same privacy budgets. The approach offers a practical, provably secure DP training mechanism with strong empirical leverage and an open-source implementation for broader adoption.
Abstract
Deep learning models have been extensively adopted in various regions due to their ability to represent hierarchical features, which highly rely on the training set and procedures. Thus, protecting the training process and deep learning algorithms is paramount in privacy preservation. Although Differential Privacy (DP) as a powerful cryptographic primitive has achieved satisfying results in deep learning training, the existing schemes still fall short in preserving model utility, i.e., they either invoke a high noise scale or inevitably harm the original gradients. To address the above issues, in this paper, we present a more robust and provably secure approach for differentially private training called GReDP. Specifically, we compute the model gradients in the frequency domain and adopt a new approach to reduce the noise level. Unlike previous work, our GReDP only requires half of the noise scale compared to DPSGD [1] while keeping all the gradient information intact. We present a detailed analysis of our method both theoretically and empirically. The experimental results show that our GReDP works consistently better than the baselines on all models and training settings.
