Re$^3$MCN: Cubic Newton + Variance Reduction + Momentum + Quadratic Regularization for Finite-sum Non-convex Problems
Dmitry Pasechnyuk-Vilensky, Dmitry Kamzolov, Martin Takáč
TL;DR
The paper addresses efficient attainment of second-order stationary points for nonconvex finite-sum optimization by marrying variance-reduced stochastic cubic-regularized Newton updates with EMA-based SARAH estimators. The Re3MCN algorithm leverages cubic regularization, momentum-like EMA smoothing, and a proximal quadratic term to improve noise absorption, while employing a matrix-free Hutchinson Hessian approximation to stay scalable in high dimensions. The authors establish that, in the nonconvex setting, the method achieves an $(\varepsilon,\sqrt{L_2\varepsilon})$-SOSP with oracle complexity $\widetilde{\mathcal{O}}(n+n^{1/2}\varepsilon^{-3/2})$, and in the convex setting derive a rate of $\widetilde{\mathcal{O}}\left(\frac{L R^3}{T^2}+\frac{\sigma_2 R^2}{T^2}+\frac{\sigma_1 R}{\sqrt{T}}\right)$. The work also introduces restarts and a fast inner solver, and demonstrates how Hutchinson-based Hessian estimation preserves effectiveness while avoiding $\mathcal{O}(d^2)$ costs. Overall, the combination of EMA-SARAH variance reduction, cubic regularization, restarts, and matrix-free techniques yields a practically viable and theoretically solid approach for large-scale nonconvex optimization with finite sums.
Abstract
We analyze a stochastic cubic regularized Newton method for finite sum optimization $\textstyle\min_{x\in\mathbb{R}^d} F(x) \;=\; \frac{1}{n}\sum_{i=1}^n f_i(x)$, that uses SARAH-type recursive variance reduction with mini-batches of size $b\sim n^{1/2}$ and exponential moving averages (EMA) for gradient and Hessian estimators. We show that the method achieves a $(\varepsilon,\sqrt{L_2\varepsilon})$-second-order stationary point (SOSP) with total stochastic oracle calls $n + \widetilde{\mathcal{O}}(n^{1/2}\varepsilon^{-3/2})$ in the nonconvex case (Theorem 8.3) and convergence rate $\widetilde{\mathcal{O}}(\frac{L R^3}{T^2} + \frac{σ_2 R^2}{T^2} + \frac{σ_1 R}{\sqrt{T}})$ in the convex case (Theorem 6.1). We also treat the matrix-free variant based on Hutchinson's estimator for Hessian and present a fast inner solver for the cubic subproblem with provable attainment of the required inexactness level.
