Table of Contents
Fetching ...

Unified Theory of Adaptive Variance Reduction

Aleksandr Shestakov, Valery Parfenov, Aleksandr Beznosikov

TL;DR

The paper addresses scalable stochastic optimization by unifying variance-reduction techniques with adaptive, parameter-free step sizes. It introduces a broad unified framework that accommodates biased estimators, enabling convergence guarantees for nonconvex and PL regimes across finite-sum, distributed, and coordinate settings. The main contributions are a new adaptive variance-reduction family, a unified analysis independent of unbiasedness, and extensive experiments showing competitive, tuning-free performance. This work advances practical stochastic optimization by reducing hyperparameter tuning while preserving optimal convergence characteristics across diverse problem classes.

Abstract

Variance reduction is a family of powerful mechanisms for stochastic optimization that appears to be helpful in many machine learning tasks. It is based on estimating the exact gradient with some recursive sequences. Previously, many papers demonstrated that methods with unbiased variance-reduction estimators can be described in a single framework. We generalize this approach and show that the unbiasedness assumption is excessive; hence, we include biased estimators in this analysis. But the main contribution of our work is the proposition of new variance reduction methods with adaptive step sizes that are adjusted throughout the algorithm iterations and, moreover, do not need hyperparameter tuning. Our analysis covers finite- sum problems, distributed optimization, and coordinate methods. Numerical experiments in various tasks validate the effectiveness of our methods.

Unified Theory of Adaptive Variance Reduction

TL;DR

The paper addresses scalable stochastic optimization by unifying variance-reduction techniques with adaptive, parameter-free step sizes. It introduces a broad unified framework that accommodates biased estimators, enabling convergence guarantees for nonconvex and PL regimes across finite-sum, distributed, and coordinate settings. The main contributions are a new adaptive variance-reduction family, a unified analysis independent of unbiasedness, and extensive experiments showing competitive, tuning-free performance. This work advances practical stochastic optimization by reducing hyperparameter tuning while preserving optimal convergence characteristics across diverse problem classes.

Abstract

Variance reduction is a family of powerful mechanisms for stochastic optimization that appears to be helpful in many machine learning tasks. It is based on estimating the exact gradient with some recursive sequences. Previously, many papers demonstrated that methods with unbiased variance-reduction estimators can be described in a single framework. We generalize this approach and show that the unbiasedness assumption is excessive; hence, we include biased estimators in this analysis. But the main contribution of our work is the proposition of new variance reduction methods with adaptive step sizes that are adjusted throughout the algorithm iterations and, moreover, do not need hyperparameter tuning. Our analysis covers finite- sum problems, distributed optimization, and coordinate methods. Numerical experiments in various tasks validate the effectiveness of our methods.

Paper Structure

This paper contains 33 sections, 42 theorems, 132 equations, 16 figures, 1 algorithm.

Key Result

theorem 1

Let $f$ be $L$-smooth and satisfy Assumption as:unified. Then Algorithn alg:gen_sgd with step size for any $T > 0$ achieves where $V^0 = f(x^0) - f_* + \frac{\gamma}{2\rho_1} \left\|g^0 - \nabla f(x^{0})\right\|^2 + \frac{\gamma A}{2\rho_1\rho_2} \sigma^2_0$.

Figures (16)

  • Figure 1: Results on the a9a dataset showing convergence behaviour of SAGA, PAGE, ZeroSARAH and EF21 with theoretical, tuned and adaptive stepsize.
  • Figure :
  • Figure :
  • Figure :
  • Figure :
  • ...and 11 more figures

Theorems & Definitions (63)

  • definition 1
  • definition 2
  • theorem 1
  • theorem 2
  • theorem 3
  • lemma 1
  • corollary 1
  • lemma 2
  • corollary 2
  • lemma 3
  • ...and 53 more