Second-Order Convergence in Private Stochastic Non-Convex Optimization
Youming Tao, Zuyuan Zhang, Dongxiao Yu, Xiuzhen Cheng, Falko Dressler, Di Wang
TL;DR
This work tackles the problem of finding second-order stationary points (SOSP) under differential privacy in stochastic non-convex optimization. It introduces Gauss-PSGD, a generic Gaussian-perturbed SGD framework with a model-drift-based saddle-point escape criterion, combined with Ada-DP-SPIDER as an adaptive gradient oracle to correct prior error-rate bounds. The authors establish DP-SOSP guarantees with a centralized rate of $\alpha = \tilde{O}\left(\frac{1}{n^{1/3}} + \left(\frac{\sqrt{d}}{n\epsilon}\right)^{2/5}\right)$ and extend these results to distributed settings with heterogeneous data, achieving $\alpha = \tilde{O}\left(\frac{1}{(mn)^{1/3}} + \left(\frac{\sqrt{d}}{\sqrt{m}n\epsilon}\right)^{2/5}\right)$. The framework inherently outputs a DP-SOSP without private model selection, mitigating privacy and communication overhead in distributed environments, and is supported by rigorous descent and escape analyses. Experimental results on MNIST and CIFAR-10 corroborate improved accuracy and faster convergence under varying privacy budgets. Overall, the approach provides tighter DP-SOSP guarantees and a practical, scalable solution for privacy-preserving non-convex optimization in distributed systems.
Abstract
We investigate the problem of finding second-order stationary points (SOSP) in differentially private (DP) stochastic non-convex optimization. Existing methods suffer from two key limitations: (i) inaccurate convergence error rate due to overlooking gradient variance in the saddle point escape analysis, and (ii) dependence on auxiliary private model selection procedures for identifying DP-SOSP, which can significantly impair utility, particularly in distributed settings. To address these issues, we propose a generic perturbed stochastic gradient descent (PSGD) framework built upon Gaussian noise injection and general gradient oracles. A core innovation of our framework is using model drift distance to determine whether PSGD escapes saddle points, ensuring convergence to approximate local minima without relying on second-order information or additional DP-SOSP identification. By leveraging the adaptive DP-SPIDER estimator as a specific gradient oracle, we develop a new DP algorithm that rectifies the convergence error rates reported in prior work. We further extend this algorithm to distributed learning with heterogeneous data, providing the first formal guarantees for finding DP-SOSP in such settings. Our analysis also highlights the detrimental impacts of private selection procedures in distributed learning under high-dimensional models, underscoring the practical benefits of our design. Numerical experiments on real-world datasets validate the efficacy of our approach.
