PrivSGP-VR: Differentially Private Variance-Reduced Stochastic Gradient Push with Tight Utility Bounds
Zehan Zhu, Yan Huang, Xin Wang, Jinming Xu
TL;DR
This work introduces PrivSGP-VR, a fully decentralized, non-convex learning method that provides per-node $(\epsilon_i,\delta_i)$-DP via Gaussian noise while employing variance reduction to mitigate stochastic gradient noise. The authors prove a sub-linear convergence rate $\mathcal{O}\left(1/\sqrt{nK}\right)$ with linear speedup in the number of nodes and derive an optimal iteration count $K$ using moments accountant to maximize utility, achieving a tight privacy-aware bound that matches server-client counterparts and improves decentralized performance by $1/\sqrt{n}$. The method operates over time-varying directed graphs using Stochastic Gradient Push with Push-Sum, and its effectiveness is validated through extensive experiments on ResNet-18/CIFAR-10 and shallow networks on MNIST, demonstrating linear speedup, VR benefits, and favorable comparisons to DP-based decentralized baselines. Overall, PrivSGP-VR advances private distributed learning by providing per-node DP guarantees, tight utility bounds, and practical guidance for selecting the iteration budget under privacy constraints.
Abstract
In this paper, we propose a differentially private decentralized learning method (termed PrivSGP-VR) which employs stochastic gradient push with variance reduction and guarantees $(ε, δ)$-differential privacy (DP) for each node. Our theoretical analysis shows that, under DP Gaussian noise with constant variance, PrivSGP-VR achieves a sub-linear convergence rate of $\mathcal{O}(1/\sqrt{nK})$, where $n$ and $K$ are the number of nodes and iterations, respectively, which is independent of stochastic gradient variance, and achieves a linear speedup with respect to $n$. Leveraging the moments accountant method, we further derive an optimal $K$ to maximize the model utility under certain privacy budget in decentralized settings. With this optimized $K$, PrivSGP-VR achieves a tight utility bound of $\mathcal{O}\left( \sqrt{d\log \left( \frac{1}δ \right)}/(\sqrt{n}Jε) \right)$, where $J$ and $d$ are the number of local samples and the dimension of decision variable, respectively, which matches that of the server-client distributed counterparts, and exhibits an extra factor of $1/\sqrt{n}$ improvement compared to that of the existing decentralized counterparts, such as A(DP)$^2$SGD. Extensive experiments corroborate our theoretical findings, especially in terms of the maximized utility with optimized $K$, in fully decentralized settings.
