Table of Contents
Fetching ...

PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds

Zehan Zhu, Yan Huang, Xin Wang, Jinming Xu

TL;DR

This work introduces PrivSGP-VR, a fully decentralized, non-convex learning method that provides per-node $(\epsilon_i,\delta_i)$-DP via Gaussian noise while employing variance reduction to mitigate stochastic gradient noise. The authors prove a sub-linear convergence rate $\mathcal{O}\left(1/\sqrt{nK}\right)$ with linear speedup in the number of nodes and derive an optimal iteration count $K$ using moments accountant to maximize utility, achieving a tight privacy-aware bound that matches server-client counterparts and improves decentralized performance by $1/\sqrt{n}$. The method operates over time-varying directed graphs using Stochastic Gradient Push with Push-Sum, and its effectiveness is validated through extensive experiments on ResNet-18/CIFAR-10 and shallow networks on MNIST, demonstrating linear speedup, VR benefits, and favorable comparisons to DP-based decentralized baselines. Overall, PrivSGP-VR advances private distributed learning by providing per-node DP guarantees, tight utility bounds, and practical guidance for selecting the iteration budget under privacy constraints.

Abstract

In this paper, we propose a differentially private decentralized learning method (termed PrivSGP-VR) which employs stochastic gradient push with variance reduction and guarantees $(ε, δ)$-differential privacy (DP) for each node. Our theoretical analysis shows that, under DP Gaussian noise with constant variance, PrivSGP-VR achieves a sub-linear convergence rate of $\mathcal{O}(1/\sqrt{nK})$, where $n$ and $K$ are the number of nodes and iterations, respectively, which is independent of stochastic gradient variance, and achieves a linear speedup with respect to $n$. Leveraging the moments accountant method, we further derive an optimal $K$ to maximize the model utility under certain privacy budget in decentralized settings. With this optimized $K$, PrivSGP-VR achieves a tight utility bound of $\mathcal{O}\left( \sqrt{d\log \left( \frac{1}δ \right)}/(\sqrt{n}Jε) \right)$, where $J$ and $d$ are the number of local samples and the dimension of decision variable, respectively, which matches that of the server-client distributed counterparts, and exhibits an extra factor of $1/\sqrt{n}$ improvement compared to that of the existing decentralized counterparts, such as A(DP)$^2$SGD. Extensive experiments corroborate our theoretical findings, especially in terms of the maximized utility with optimized $K$, in fully decentralized settings.

PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds

TL;DR

This work introduces PrivSGP-VR, a fully decentralized, non-convex learning method that provides per-node -DP via Gaussian noise while employing variance reduction to mitigate stochastic gradient noise. The authors prove a sub-linear convergence rate with linear speedup in the number of nodes and derive an optimal iteration count using moments accountant to maximize utility, achieving a tight privacy-aware bound that matches server-client counterparts and improves decentralized performance by . The method operates over time-varying directed graphs using Stochastic Gradient Push with Push-Sum, and its effectiveness is validated through extensive experiments on ResNet-18/CIFAR-10 and shallow networks on MNIST, demonstrating linear speedup, VR benefits, and favorable comparisons to DP-based decentralized baselines. Overall, PrivSGP-VR advances private distributed learning by providing per-node DP guarantees, tight utility bounds, and practical guidance for selecting the iteration budget under privacy constraints.

Abstract

In this paper, we propose a differentially private decentralized learning method (termed PrivSGP-VR) which employs stochastic gradient push with variance reduction and guarantees -differential privacy (DP) for each node. Our theoretical analysis shows that, under DP Gaussian noise with constant variance, PrivSGP-VR achieves a sub-linear convergence rate of , where and are the number of nodes and iterations, respectively, which is independent of stochastic gradient variance, and achieves a linear speedup with respect to . Leveraging the moments accountant method, we further derive an optimal to maximize the model utility under certain privacy budget in decentralized settings. With this optimized , PrivSGP-VR achieves a tight utility bound of , where and are the number of local samples and the dimension of decision variable, respectively, which matches that of the server-client distributed counterparts, and exhibits an extra factor of improvement compared to that of the existing decentralized counterparts, such as A(DP)SGD. Extensive experiments corroborate our theoretical findings, especially in terms of the maximized utility with optimized , in fully decentralized settings.
Paper Structure (31 sections, 11 theorems, 104 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 31 sections, 11 theorems, 104 equations, 10 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Suppose Assumptions Ass_weight_matrix-assumption_bounede_outer_variation hold. Let $K$ be the total number of iterations and $f^*=\underset{x\in \mathbb{R}^d}{\min}f\left( x \right)$. If the step-size is set as $\gamma=\sqrt{\frac{n}{K}}$, then there exist constants $C$ and $q \in [0,1)$, which depe where $F^0=f\left( x^0 \right) -f^*$, $C$ and $q$ can be found in Lemma def_of_C_and_q, and the def

Figures (10)

  • Figure 1: Comparison of convergence performance for PrivSGP-VR over 8, 16, 32 and 64 nodes under the same DP Gaussian noise variance, when training ResNet-18 on Cifar-10.
  • Figure 2: Comparison of convergence performance for PrivSGP-VR over 16 nodes by setting different total number of iterations $K$ under a certain privacy budget, when training ResNet-18 on Cifar-10.
  • Figure 3: Performance of running PrivSGP-VR for $K^*$ ($K$) iterations under different certain privacy budgets $\epsilon$, when training ResNet-18 on Cifar-10.
  • Figure 4: Comparison of convergence performance for PrivSGP-VR with PrivSGP over 16 nodes under the same DP Gaussian noise variance, when training ResNet-18 on Cifar-10.
  • Figure 5: Comparison of convergence performance for PrivSGP-VR with DP$^2$-SGD and A(DP)$^2$SGD over 16 nodes with $(3,10^{-5})$-DP guarantee for each node, when training ResNet-18 on Cifar-10.
  • ...and 5 more figures

Theorems & Definitions (26)

  • Definition 1: $(\epsilon,\delta)$-DP
  • Theorem 1: Convergence Rate
  • proof
  • Remark 1
  • Theorem 2: Privacy Guarantee
  • proof
  • Corollary 1: Maximized Utility Guarantee
  • proof
  • Remark 2
  • Remark 3
  • ...and 16 more