PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds

Zehan Zhu; Yan Huang; Xin Wang; Jinming Xu

PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds

Zehan Zhu, Yan Huang, Xin Wang, Jinming Xu

TL;DR

This work introduces PrivSGP-VR, a fully decentralized, non-convex learning method that provides per-node $(\epsilon_i,\delta_i)$-DP via Gaussian noise while employing variance reduction to mitigate stochastic gradient noise. The authors prove a sub-linear convergence rate $\mathcal{O}\left(1/\sqrt{nK}\right)$ with linear speedup in the number of nodes and derive an optimal iteration count $K$ using moments accountant to maximize utility, achieving a tight privacy-aware bound that matches server-client counterparts and improves decentralized performance by $1/\sqrt{n}$. The method operates over time-varying directed graphs using Stochastic Gradient Push with Push-Sum, and its effectiveness is validated through extensive experiments on ResNet-18/CIFAR-10 and shallow networks on MNIST, demonstrating linear speedup, VR benefits, and favorable comparisons to DP-based decentralized baselines. Overall, PrivSGP-VR advances private distributed learning by providing per-node DP guarantees, tight utility bounds, and practical guidance for selecting the iteration budget under privacy constraints.

Abstract

In this paper, we propose a differentially private decentralized learning method (termed PrivSGP-VR) which employs stochastic gradient push with variance reduction and guarantees $(ε, δ)$-differential privacy (DP) for each node. Our theoretical analysis shows that, under DP Gaussian noise with constant variance, PrivSGP-VR achieves a sub-linear convergence rate of $\mathcal{O}(1/\sqrt{nK})$, where $n$ and $K$ are the number of nodes and iterations, respectively, which is independent of stochastic gradient variance, and achieves a linear speedup with respect to $n$. Leveraging the moments accountant method, we further derive an optimal $K$ to maximize the model utility under certain privacy budget in decentralized settings. With this optimized $K$, PrivSGP-VR achieves a tight utility bound of $\mathcal{O}\left( \sqrt{d\log \left( \frac{1}δ \right)}/(\sqrt{n}Jε) \right)$, where $J$ and $d$ are the number of local samples and the dimension of decision variable, respectively, which matches that of the server-client distributed counterparts, and exhibits an extra factor of $1/\sqrt{n}$ improvement compared to that of the existing decentralized counterparts, such as A(DP)$^2$SGD. Extensive experiments corroborate our theoretical findings, especially in terms of the maximized utility with optimized $K$, in fully decentralized settings.

PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds

TL;DR

This work introduces PrivSGP-VR, a fully decentralized, non-convex learning method that provides per-node

-DP via Gaussian noise while employing variance reduction to mitigate stochastic gradient noise. The authors prove a sub-linear convergence rate

with linear speedup in the number of nodes and derive an optimal iteration count

using moments accountant to maximize utility, achieving a tight privacy-aware bound that matches server-client counterparts and improves decentralized performance by

. The method operates over time-varying directed graphs using Stochastic Gradient Push with Push-Sum, and its effectiveness is validated through extensive experiments on ResNet-18/CIFAR-10 and shallow networks on MNIST, demonstrating linear speedup, VR benefits, and favorable comparisons to DP-based decentralized baselines. Overall, PrivSGP-VR advances private distributed learning by providing per-node DP guarantees, tight utility bounds, and practical guidance for selecting the iteration budget under privacy constraints.

Abstract

In this paper, we propose a differentially private decentralized learning method (termed PrivSGP-VR) which employs stochastic gradient push with variance reduction and guarantees

-differential privacy (DP) for each node. Our theoretical analysis shows that, under DP Gaussian noise with constant variance, PrivSGP-VR achieves a sub-linear convergence rate of

, where

and

are the number of nodes and iterations, respectively, which is independent of stochastic gradient variance, and achieves a linear speedup with respect to

. Leveraging the moments accountant method, we further derive an optimal

to maximize the model utility under certain privacy budget in decentralized settings. With this optimized

, PrivSGP-VR achieves a tight utility bound of

, where

and

are the number of local samples and the dimension of decision variable, respectively, which matches that of the server-client distributed counterparts, and exhibits an extra factor of

improvement compared to that of the existing decentralized counterparts, such as A(DP)

SGD. Extensive experiments corroborate our theoretical findings, especially in terms of the maximized utility with optimized

, in fully decentralized settings.

Paper Structure (31 sections, 11 theorems, 104 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 31 sections, 11 theorems, 104 equations, 10 figures, 1 table, 2 algorithms.

Introduction
Related Works
Algorithm Development
Stochastic gradient push over time-varying directed graphs.
Ensuring differential privacy guarantee for each node.
Eliminating the stochastic gradient noise.
Theoretical Analysis
Convergence Guarantee
Privacy and Utility Guarantee
Experiments
Deep CNN ResNet-18 training
Linear speedup under constant DP Gaussian noise variance.
Optimizing number of iterations under certain privacy budget.
Trade off between the maximized model utility and privacy guarantee.
Verifying the effectiveness of variance reduction technique.
...and 16 more sections

Key Result

Theorem 1

Suppose Assumptions Ass_weight_matrix-assumption_bounede_outer_variation hold. Let $K$ be the total number of iterations and $f^*=\underset{x\in \mathbb{R}^d}{\min}f\left( x \right)$. If the step-size is set as $\gamma=\sqrt{\frac{n}{K}}$, then there exist constants $C$ and $q \in [0,1)$, which depe where $F^0=f\left( x^0 \right) -f^*$, $C$ and $q$ can be found in Lemma def_of_C_and_q, and the def

Figures (10)

Figure 1: Comparison of convergence performance for PrivSGP-VR over 8, 16, 32 and 64 nodes under the same DP Gaussian noise variance, when training ResNet-18 on Cifar-10.
Figure 2: Comparison of convergence performance for PrivSGP-VR over 16 nodes by setting different total number of iterations $K$ under a certain privacy budget, when training ResNet-18 on Cifar-10.
Figure 3: Performance of running PrivSGP-VR for $K^*$ ($K$) iterations under different certain privacy budgets $\epsilon$, when training ResNet-18 on Cifar-10.
Figure 4: Comparison of convergence performance for PrivSGP-VR with PrivSGP over 16 nodes under the same DP Gaussian noise variance, when training ResNet-18 on Cifar-10.
Figure 5: Comparison of convergence performance for PrivSGP-VR with DP$^2$-SGD and A(DP)$^2$SGD over 16 nodes with $(3,10^{-5})$-DP guarantee for each node, when training ResNet-18 on Cifar-10.
...and 5 more figures

Theorems & Definitions (26)

Definition 1: $(\epsilon,\delta)$-DP
Theorem 1: Convergence Rate
proof
Remark 1
Theorem 2: Privacy Guarantee
proof
Corollary 1: Maximized Utility Guarantee
proof
Remark 2
Remark 3
...and 16 more

PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds

TL;DR

Abstract

PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (26)