Table of Contents
Fetching ...

Re$^3$MCN: Cubic Newton + Variance Reduction + Momentum + Quadratic Regularization for Finite-sum Non-convex Problems

Dmitry Pasechnyuk-Vilensky, Dmitry Kamzolov, Martin Takáč

TL;DR

The paper addresses efficient attainment of second-order stationary points for nonconvex finite-sum optimization by marrying variance-reduced stochastic cubic-regularized Newton updates with EMA-based SARAH estimators. The Re3MCN algorithm leverages cubic regularization, momentum-like EMA smoothing, and a proximal quadratic term to improve noise absorption, while employing a matrix-free Hutchinson Hessian approximation to stay scalable in high dimensions. The authors establish that, in the nonconvex setting, the method achieves an $(\varepsilon,\sqrt{L_2\varepsilon})$-SOSP with oracle complexity $\widetilde{\mathcal{O}}(n+n^{1/2}\varepsilon^{-3/2})$, and in the convex setting derive a rate of $\widetilde{\mathcal{O}}\left(\frac{L R^3}{T^2}+\frac{\sigma_2 R^2}{T^2}+\frac{\sigma_1 R}{\sqrt{T}}\right)$. The work also introduces restarts and a fast inner solver, and demonstrates how Hutchinson-based Hessian estimation preserves effectiveness while avoiding $\mathcal{O}(d^2)$ costs. Overall, the combination of EMA-SARAH variance reduction, cubic regularization, restarts, and matrix-free techniques yields a practically viable and theoretically solid approach for large-scale nonconvex optimization with finite sums.

Abstract

We analyze a stochastic cubic regularized Newton method for finite sum optimization $\textstyle\min_{x\in\mathbb{R}^d} F(x) \;=\; \frac{1}{n}\sum_{i=1}^n f_i(x)$, that uses SARAH-type recursive variance reduction with mini-batches of size $b\sim n^{1/2}$ and exponential moving averages (EMA) for gradient and Hessian estimators. We show that the method achieves a $(\varepsilon,\sqrt{L_2\varepsilon})$-second-order stationary point (SOSP) with total stochastic oracle calls $n + \widetilde{\mathcal{O}}(n^{1/2}\varepsilon^{-3/2})$ in the nonconvex case (Theorem 8.3) and convergence rate $\widetilde{\mathcal{O}}(\frac{L R^3}{T^2} + \frac{σ_2 R^2}{T^2} + \frac{σ_1 R}{\sqrt{T}})$ in the convex case (Theorem 6.1). We also treat the matrix-free variant based on Hutchinson's estimator for Hessian and present a fast inner solver for the cubic subproblem with provable attainment of the required inexactness level.

Re$^3$MCN: Cubic Newton + Variance Reduction + Momentum + Quadratic Regularization for Finite-sum Non-convex Problems

TL;DR

The paper addresses efficient attainment of second-order stationary points for nonconvex finite-sum optimization by marrying variance-reduced stochastic cubic-regularized Newton updates with EMA-based SARAH estimators. The Re3MCN algorithm leverages cubic regularization, momentum-like EMA smoothing, and a proximal quadratic term to improve noise absorption, while employing a matrix-free Hutchinson Hessian approximation to stay scalable in high dimensions. The authors establish that, in the nonconvex setting, the method achieves an -SOSP with oracle complexity , and in the convex setting derive a rate of . The work also introduces restarts and a fast inner solver, and demonstrates how Hutchinson-based Hessian estimation preserves effectiveness while avoiding costs. Overall, the combination of EMA-SARAH variance reduction, cubic regularization, restarts, and matrix-free techniques yields a practically viable and theoretically solid approach for large-scale nonconvex optimization with finite sums.

Abstract

We analyze a stochastic cubic regularized Newton method for finite sum optimization , that uses SARAH-type recursive variance reduction with mini-batches of size and exponential moving averages (EMA) for gradient and Hessian estimators. We show that the method achieves a -second-order stationary point (SOSP) with total stochastic oracle calls in the nonconvex case (Theorem 8.3) and convergence rate in the convex case (Theorem 6.1). We also treat the matrix-free variant based on Hutchinson's estimator for Hessian and present a fast inner solver for the cubic subproblem with provable attainment of the required inexactness level.

Paper Structure

This paper contains 29 sections, 13 theorems, 83 equations, 1 algorithm.

Key Result

Lemma 3.1

Let $m(s):= g^\top s + \tfrac{1}{2} s^\top H s + \tfrac{M}{6}\|s\|^3$ and $r:=\nabla m(s)=g+Hs+\frac{M}{2}\|s\|\,s$. Then

Theorems & Definitions (24)

  • Lemma 3.1: Exact identity for the cubic model value
  • proof
  • Lemma 3.2: Upper bound on the model value under inexactness
  • proof
  • Lemma 3.3: Bounding the error inner products
  • proof
  • Proposition 3.4: One-step expected decrease
  • proof
  • Lemma 4.1: Squared-weight sum
  • proof
  • ...and 14 more