Probabilistic Guarantees of Stochastic Recursive Gradient in Non-Convex Finite Sum Problems
Yanjie Zhong, Jiaqi Li, Soumendra Lahiri
TL;DR
This work addresses non-convex finite-sum optimization by introducing Prob-SARAH, a SARAH-based variance-reduced method accompanied by a novel dimension-free Azuma–Hoeffding-type bound for martingale differences with random bounds. The authors establish high-probability bounds on the gradient estimator and derive a near-optimal in-probability complexity $Comp(\varepsilon,\delta)=\tilde{O}_{L,\Delta_f,\alpha_M}(1/\varepsilon^3 \wedge \sqrt{n}/\varepsilon^2)$, while introducing the notion of $\varepsilon$-semi-independence. The key methodological contribution is the new concentration inequality, enabling rigorous probabilistic analysis of SARAH-style updates in the non-convex setting, supported by experiments on logistic regression with non-convex regularization and a two-layer neural network. The results have practical impact by providing strong probabilistic guarantees and robust performance in real-world non-convex finite-sum problems, with potential applicability to broader SARAH-family algorithms.
Abstract
This paper develops a new dimension-free Azuma-Hoeffding type bound on summation norm of a martingale difference sequence with random individual bounds. With this novel result, we provide high-probability bounds for the gradient norm estimator in the proposed algorithm Prob-SARAH, which is a modified version of the StochAstic Recursive grAdient algoritHm (SARAH), a state-of-art variance reduced algorithm that achieves optimal computational complexity in expectation for the finite sum problem. The in-probability complexity by Prob-SARAH matches the best in-expectation result up to logarithmic factors. Empirical experiments demonstrate the superior probabilistic performance of Prob-SARAH on real datasets compared to other popular algorithms.
