Table of Contents
Fetching ...

DP-SGD-Global-Adapt-V2-S: Triad Improvements of Privacy, Accuracy and Fairness via Step Decay Noise Multiplier and Step Decay Upper Clipping Threshold

Sai Venkatesh Chilukoti, Md Imran Hossen, Liqun Shan, Vijay Srinivas Tida, Mahathir Mohammad Bappy, Wenmeng Tian, Xiai Hei

TL;DR

This work targets the trade-offs between privacy, accuracy, and fairness in DP-SGD by identifying convergence issues in DP-SGD-Global-Adapt and proposing DP-SGD-Global-Adapt-V2-S, which combines step-decay noise with step-decay clipping and DP-PSAC clipping when needed. The approach is formalized with decay schedulers (linear, time, and step) and a tCDP-based privacy accountant to track cumulative privacy loss across epochs. Empirical results across MNIST, CIFAR-10, CIFAR-100, unbalanced MNIST, and Thinwall show that step-decay noise yields faster convergence and higher utility, while enhancing fairness as measured by reduced privacy cost gaps. The work also provides explicit mathematical derivations for privacy budgets under various decay schemes and offers practical guidance for hyperparameter selection, contributing to robust, privacy-preserving training in sensitive domains such as additive manufacturing.

Abstract

Differentially Private Stochastic Gradient Descent (DP-SGD) has become a widely used technique for safeguarding sensitive information in deep learning applications. Unfortunately, DPSGD's per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility and fairness. We observe that the latest DP-SGD-Global-Adapt's average gradient norm is the same throughout the training. Even when it is integrated with the existing linear decay noise multiplier, it has little or no advantage. Moreover, we notice that its upper clipping threshold increases exponentially towards the end of training, potentially impacting the models convergence. Other algorithms, DP-PSAC, Auto-S, DP-SGD-Global, and DP-F, have utility and fairness that are similar to or worse than DP-SGD, as demonstrated in experiments. To overcome these problems and improve utility and fairness, we developed the DP-SGD-Global-Adapt-V2-S. It has a step-decay noise multiplier and an upper clipping threshold that is also decayed step-wise. DP-SGD-Global-Adapt-V2-S with a privacy budget ($ε$) of 1 improves accuracy by 0.9795\%, 0.6786\%, and 4.0130\% in MNIST, CIFAR10, and CIFAR100, respectively. It also reduces the privacy cost gap ($π$) by 89.8332% and 60.5541% in unbalanced MNIST and Thinwall datasets, respectively. Finally, we develop mathematical expressions to compute the privacy budget using truncated concentrated differential privacy (tCDP) for DP-SGD-Global-Adapt-V2-T and DP-SGD-Global-Adapt-V2-S.

DP-SGD-Global-Adapt-V2-S: Triad Improvements of Privacy, Accuracy and Fairness via Step Decay Noise Multiplier and Step Decay Upper Clipping Threshold

TL;DR

This work targets the trade-offs between privacy, accuracy, and fairness in DP-SGD by identifying convergence issues in DP-SGD-Global-Adapt and proposing DP-SGD-Global-Adapt-V2-S, which combines step-decay noise with step-decay clipping and DP-PSAC clipping when needed. The approach is formalized with decay schedulers (linear, time, and step) and a tCDP-based privacy accountant to track cumulative privacy loss across epochs. Empirical results across MNIST, CIFAR-10, CIFAR-100, unbalanced MNIST, and Thinwall show that step-decay noise yields faster convergence and higher utility, while enhancing fairness as measured by reduced privacy cost gaps. The work also provides explicit mathematical derivations for privacy budgets under various decay schemes and offers practical guidance for hyperparameter selection, contributing to robust, privacy-preserving training in sensitive domains such as additive manufacturing.

Abstract

Differentially Private Stochastic Gradient Descent (DP-SGD) has become a widely used technique for safeguarding sensitive information in deep learning applications. Unfortunately, DPSGD's per-sample gradient clipping and uniform noise addition during training can significantly degrade model utility and fairness. We observe that the latest DP-SGD-Global-Adapt's average gradient norm is the same throughout the training. Even when it is integrated with the existing linear decay noise multiplier, it has little or no advantage. Moreover, we notice that its upper clipping threshold increases exponentially towards the end of training, potentially impacting the models convergence. Other algorithms, DP-PSAC, Auto-S, DP-SGD-Global, and DP-F, have utility and fairness that are similar to or worse than DP-SGD, as demonstrated in experiments. To overcome these problems and improve utility and fairness, we developed the DP-SGD-Global-Adapt-V2-S. It has a step-decay noise multiplier and an upper clipping threshold that is also decayed step-wise. DP-SGD-Global-Adapt-V2-S with a privacy budget () of 1 improves accuracy by 0.9795\%, 0.6786\%, and 4.0130\% in MNIST, CIFAR10, and CIFAR100, respectively. It also reduces the privacy cost gap () by 89.8332% and 60.5541% in unbalanced MNIST and Thinwall datasets, respectively. Finally, we develop mathematical expressions to compute the privacy budget using truncated concentrated differential privacy (tCDP) for DP-SGD-Global-Adapt-V2-T and DP-SGD-Global-Adapt-V2-S.
Paper Structure (32 sections, 4 theorems, 48 equations, 10 figures, 21 tables, 1 algorithm)

This paper contains 32 sections, 4 theorems, 48 equations, 10 figures, 21 tables, 1 algorithm.

Key Result

Lemma 1

The Gaussian mechanism satisfies $(\frac{C^{2}}{2\sigma^{2}},\infty)-$ tCDP.

Figures (10)

  • Figure 1: Upper clipping threshold (strict max grad norm) during the training at every iteration for DP-SGD-Global-Adapt esipova2022disparate uing MNIST data. We use AdamW optimizer, OCL LR scheduler, and batch size of 64 for training and recorded the upper clipping threshold of DP-Global-Adapt after every iteration.
  • Figure 2: Average gradient norm of DP Global Adapt and all the versions of DP Global Adapt v2 during the training at every iteration. We use AdamW optimizer, OCL LR scheduler, and batch size of 64. During the mini-batch training, we record the gradient norm of every sample and then compute the average gradient in every iteration considering all the 64 samples gradient norm.
  • Figure 3: Noise multiplier progression during training of DP-Global-Adapt-V2 for all the decay schedulers. We use the formulae shown in Table \ref{['tab_1']} to compute the noise multiplier at every epoch (round). We use the drop rate of 0.99, 0.01, and 0.5 for linear, time, and step decay. For step decay, the step size (epoch drop rate) is 10.
  • Figure 4: Training loss over epochs for all the decay schedulers. we use the MNIST dataset, AdamW optimizer, OCL LR scheduler, batch size of 64 and run the DP-SGD-Global-Adapt-V2 for 100 epochs of training and recorded the loss after every epoch.
  • Figure 5: Convergence analysis is performed on the MNIST dataset using train loss for DP-SGD, DP-PSAC, DP-Auto-s, DP-Global, DP-Global-Adapt, and DP-Global-Adapt-V2-S. Each algorithm undergoes 100 training epochs with a privacy budget of 1, utilizing the AdamW optimizer with a batch size of 64, alongside the OCL LR scheduler.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4