Some Unified Theory for Variance Reduced Prox-Linear Methods

Yue Wu; Benjamin Grimmer

Some Unified Theory for Variance Reduced Prox-Linear Methods

Yue Wu, Benjamin Grimmer

TL;DR

The paper studies variance-reduced prox-linear methods for nonconvex, nonsmooth composite problems of the form $\Phi(x)=f(g(x))+h(x)$ where $g$ is a differentiable map given as a finite sum or expectation. It develops a unified convergence theory that relies on operator-norm bounds on Jacobians, enabling high-probability guarantees and tolerating inexact proximal computations. The framework accommodates a broad class of VR estimators (mini-batch, SVRG/SARAH-like, exact-eval variants) and provides sharp, dimension-aware rates, improving prior Frobenius-norm-based analyses. The results quantify the trade-offs between evaluating $g_\xi$ and $g_\xi'$ and show how to achieve $\epsilon$-stationarity with near-optimal oracle complexities across several VR schemes, including randomized epoch durations. Overall, the work advances a principled, scalable approach to variance-reduced optimization in composite, nonconvex settings with nonsmooth regularizers, with implications for efficiency in large-scale machine learning and nonlinear programming tasks.

Abstract

This work considers the nonconvex, nonsmooth problem of minimizing a composite objective of the form $f(g(x))+h(x)$ where the inner mapping $g$ is a smooth finite summation or expectation amenable to variance reduction. In such settings, prox-linear methods can enjoy variance-reduced speed-ups despite the existence of nonsmoothness. We provide a unified convergence theory applicable to a wide range of common variance-reduced vector and Jacobian constructions. All the technical conditions we required for variance-reduced methods can be summarized in a single unified assumption. Our theory (i) only requires operator norm bounds on Jacobians (whereas prior works used potentially much larger Frobenius norms), (ii) provides state-of-the-art high probability guarantees, and (iii) allows inexactness in proximal computations.

Some Unified Theory for Variance Reduced Prox-Linear Methods

TL;DR

The paper studies variance-reduced prox-linear methods for nonconvex, nonsmooth composite problems of the form

where

is a differentiable map given as a finite sum or expectation. It develops a unified convergence theory that relies on operator-norm bounds on Jacobians, enabling high-probability guarantees and tolerating inexact proximal computations. The framework accommodates a broad class of VR estimators (mini-batch, SVRG/SARAH-like, exact-eval variants) and provides sharp, dimension-aware rates, improving prior Frobenius-norm-based analyses. The results quantify the trade-offs between evaluating

and

and show how to achieve

-stationarity with near-optimal oracle complexities across several VR schemes, including randomized epoch durations. Overall, the work advances a principled, scalable approach to variance-reduced optimization in composite, nonconvex settings with nonsmooth regularizers, with implications for efficiency in large-scale machine learning and nonlinear programming tasks.

Abstract

This work considers the nonconvex, nonsmooth problem of minimizing a composite objective of the form

where the inner mapping

is a smooth finite summation or expectation amenable to variance reduction. In such settings, prox-linear methods can enjoy variance-reduced speed-ups despite the existence of nonsmoothness. We provide a unified convergence theory applicable to a wide range of common variance-reduced vector and Jacobian constructions. All the technical conditions we required for variance-reduced methods can be summarized in a single unified assumption. Our theory (i) only requires operator norm bounds on Jacobians (whereas prior works used potentially much larger Frobenius norms), (ii) provides state-of-the-art high probability guarantees, and (iii) allows inexactness in proximal computations.

Some Unified Theory for Variance Reduced Prox-Linear Methods

TL;DR

Abstract

Some Unified Theory for Variance Reduced Prox-Linear Methods

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (61)