Table of Contents
Fetching ...

Stochastic ISTA/FISTA Adaptive Step Search Algorithms for Convex Composite Optimization

Lam M. Nguyen, Katya Scheinberg, Trang H. Tran

TL;DR

The paper addresses convex composite optimization with $F(x)=f(x)+h(x)$ by developing stochastic ISTA and full backtracking FISTA variants that operate with biased gradient estimates. It introduces a stochastic first-order oracle and a backtracking step-search framework for proximal methods, establishing conditions under which guaranteed descent and accelerated convergence can be maintained in expectation. The authors derive explicit bounds on the expected iteration complexity $\mathbb{E}[N_\epsilon]$, including extensions to the accelerated proximal setting and a detailed analysis of noise decay requirements. The results generalize prior deterministic and stochastic line-search analyses to accelerated proximal methods, offering practical guidance for adaptive step-size selection under stochastic, possibly biased, gradient information. The work thus provides theoretically grounded guarantees for stochastic proximal-gradient methods with backtracking, relevant to large-scale convex optimization problems in imaging and data science.

Abstract

We develop and analyze stochastic variants of ISTA and a full backtracking FISTA algorithms [Beck and Teboulle, 2009, Scheinberg et al., 2014] for composite optimization without the assumption that stochastic gradient is an unbiased estimator. This work extends analysis of inexact fixed step ISTA/FISTA in [Schmidt et al., 2011] to the case of stochastic gradient estimates and adaptive step-size parameter chosen by backtracking. It also extends the framework for analyzing stochastic line-search method in [Cartis and Scheinberg, 2018] to the proximal gradient framework as well as to the accelerated first order methods.

Stochastic ISTA/FISTA Adaptive Step Search Algorithms for Convex Composite Optimization

TL;DR

The paper addresses convex composite optimization with by developing stochastic ISTA and full backtracking FISTA variants that operate with biased gradient estimates. It introduces a stochastic first-order oracle and a backtracking step-search framework for proximal methods, establishing conditions under which guaranteed descent and accelerated convergence can be maintained in expectation. The authors derive explicit bounds on the expected iteration complexity , including extensions to the accelerated proximal setting and a detailed analysis of noise decay requirements. The results generalize prior deterministic and stochastic line-search analyses to accelerated proximal methods, offering practical guidance for adaptive step-size selection under stochastic, possibly biased, gradient information. The work thus provides theoretically grounded guarantees for stochastic proximal-gradient methods with backtracking, relevant to large-scale convex optimization problems in imaging and data science.

Abstract

We develop and analyze stochastic variants of ISTA and a full backtracking FISTA algorithms [Beck and Teboulle, 2009, Scheinberg et al., 2014] for composite optimization without the assumption that stochastic gradient is an unbiased estimator. This work extends analysis of inexact fixed step ISTA/FISTA in [Schmidt et al., 2011] to the case of stochastic gradient estimates and adaptive step-size parameter chosen by backtracking. It also extends the framework for analyzing stochastic line-search method in [Cartis and Scheinberg, 2018] to the proximal gradient framework as well as to the accelerated first order methods.
Paper Structure (15 sections, 12 theorems, 128 equations, 3 algorithms)

This paper contains 15 sections, 12 theorems, 128 equations, 3 algorithms.

Key Result

Lemma 1

Suppose that iteration $k$ is true, i.e. with $\kappa \leq \frac{1}{3}$ and $D_{\alpha_k}(y_k)$ is the gradient mapping of $F$ at $y_k$. If then the $k$-th step is successful.

Theorems & Definitions (23)

  • Definition 1: Successful Iteration
  • Definition 2: True Iteration
  • Lemma 1
  • proof
  • Theorem 1: Bounding $\mathbb{E} (N_\epsilon)$ based on $\mathbb{E}(N_G)$
  • proof
  • Lemma 2: Bounds for a successful step
  • proof
  • Lemma 3
  • proof
  • ...and 13 more