Table of Contents
Fetching ...

Some Unified Theory for Variance Reduced Prox-Linear Methods

Yue Wu, Benjamin Grimmer

TL;DR

The paper studies variance-reduced prox-linear methods for nonconvex, nonsmooth composite problems of the form $\Phi(x)=f(g(x))+h(x)$ where $g$ is a differentiable map given as a finite sum or expectation. It develops a unified convergence theory that relies on operator-norm bounds on Jacobians, enabling high-probability guarantees and tolerating inexact proximal computations. The framework accommodates a broad class of VR estimators (mini-batch, SVRG/SARAH-like, exact-eval variants) and provides sharp, dimension-aware rates, improving prior Frobenius-norm-based analyses. The results quantify the trade-offs between evaluating $g_\xi$ and $g_\xi'$ and show how to achieve $\epsilon$-stationarity with near-optimal oracle complexities across several VR schemes, including randomized epoch durations. Overall, the work advances a principled, scalable approach to variance-reduced optimization in composite, nonconvex settings with nonsmooth regularizers, with implications for efficiency in large-scale machine learning and nonlinear programming tasks.

Abstract

This work considers the nonconvex, nonsmooth problem of minimizing a composite objective of the form $f(g(x))+h(x)$ where the inner mapping $g$ is a smooth finite summation or expectation amenable to variance reduction. In such settings, prox-linear methods can enjoy variance-reduced speed-ups despite the existence of nonsmoothness. We provide a unified convergence theory applicable to a wide range of common variance-reduced vector and Jacobian constructions. All the technical conditions we required for variance-reduced methods can be summarized in a single unified assumption. Our theory (i) only requires operator norm bounds on Jacobians (whereas prior works used potentially much larger Frobenius norms), (ii) provides state-of-the-art high probability guarantees, and (iii) allows inexactness in proximal computations.

Some Unified Theory for Variance Reduced Prox-Linear Methods

TL;DR

The paper studies variance-reduced prox-linear methods for nonconvex, nonsmooth composite problems of the form where is a differentiable map given as a finite sum or expectation. It develops a unified convergence theory that relies on operator-norm bounds on Jacobians, enabling high-probability guarantees and tolerating inexact proximal computations. The framework accommodates a broad class of VR estimators (mini-batch, SVRG/SARAH-like, exact-eval variants) and provides sharp, dimension-aware rates, improving prior Frobenius-norm-based analyses. The results quantify the trade-offs between evaluating and and show how to achieve -stationarity with near-optimal oracle complexities across several VR schemes, including randomized epoch durations. Overall, the work advances a principled, scalable approach to variance-reduced optimization in composite, nonconvex settings with nonsmooth regularizers, with implications for efficiency in large-scale machine learning and nonlinear programming tasks.

Abstract

This work considers the nonconvex, nonsmooth problem of minimizing a composite objective of the form where the inner mapping is a smooth finite summation or expectation amenable to variance reduction. In such settings, prox-linear methods can enjoy variance-reduced speed-ups despite the existence of nonsmoothness. We provide a unified convergence theory applicable to a wide range of common variance-reduced vector and Jacobian constructions. All the technical conditions we required for variance-reduced methods can be summarized in a single unified assumption. Our theory (i) only requires operator norm bounds on Jacobians (whereas prior works used potentially much larger Frobenius norms), (ii) provides state-of-the-art high probability guarantees, and (iii) allows inexactness in proximal computations.

Paper Structure

This paper contains 23 sections, 30 theorems, 141 equations.

Key Result

Proposition 2.1

For any $x,y \in \mathbb{R}^n$,

Theorems & Definitions (61)

  • Proposition 2.1
  • Theorem 3.1
  • Lemma 3.3
  • Corollary 3.4
  • Lemma 3.6
  • Corollary 3.7: Algorithmic guarantee
  • Corollary 3.8: Optimized complexity bounds
  • Lemma 3.9
  • Corollary 3.10: Algorithmic guarantee
  • Corollary 3.11: Optimized complexity bounds
  • ...and 51 more