A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

Feng Zhu; Robert W. Heath; Aritra Mitra

A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

Feng Zhu, Robert W. Heath, Aritra Mitra

TL;DR

The paper delivers a unified, high-probability convergence analysis for SAG, SAGA, and IAG by bounding gradient-staleness due to sub-sampling with Bernstein concentration and by crafting a novel Lyapunov function that accounts for delayed gradient information. This two-step approach yields linear convergence guarantees that hold across stochastic and deterministic variants, and extends to non-convex objectives and Markov sampling. An immediate byproduct is substantially tighter convergence rates for the deterministic IAG method, bringing its performance closer to stochastic VR methods. The framework is modular, simple, and adaptable, offering a foundation for tail bounds in more advanced VR algorithms and settings. Practically, this work clarifies the dynamics of variance-reduced methods and broadens their applicability in large-scale, real-world optimization problems.

Abstract

Stochastic variance-reduced algorithms such as Stochastic Average Gradient (SAG) and SAGA, and their deterministic counterparts like the Incremental Aggregated Gradient (IAG) method, have been extensively studied in large-scale machine learning. Despite their popularity, existing analyses for these algorithms are disparate, relying on different proof techniques tailored to each method. Furthermore, the original proof of SAG is known to be notoriously involved, requiring computer-aided analysis. Focusing on finite-sum optimization with smooth and strongly convex objective functions, our main contribution is to develop a single unified convergence analysis that applies to all three algorithms: SAG, SAGA, and IAG. Our analysis features two key steps: (i) establishing a bound on delays due to stochastic sub-sampling using simple concentration tools, and (ii) carefully designing a novel Lyapunov function that accounts for such delays. The resulting proof is short and modular, providing the first high-probability bounds for SAG and SAGA that can be seamlessly extended to non-convex objectives and Markov sampling. As an immediate byproduct of our new analysis technique, we obtain the best known rates for the IAG algorithm, significantly improving upon prior bounds.

A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

TL;DR

Abstract

Paper Structure (12 sections, 12 theorems, 64 equations)

This paper contains 12 sections, 12 theorems, 64 equations.

Introduction
Technical Background
Analysis and Results
Step 1: Bounding the Staleness from Sub-Sampling
Bounding the Gradient Error
Step 2: Designing the Lyapunov Function
Extension to Non-Convex Objectives
Extension to Markov Sampling
Extension to the IAG Method
Conclusion
Proof of Lemma \ref{['lem:burnin']}
Proof of Lemma \ref{['lem:bounded_markov']}

Key Result

Lemma 3.3

For any $\delta\in(0,1)$ and $\tau\geq (8N/3)\log (NK/\delta)$, with probability at least $1-\delta$, the following holds:

Theorems & Definitions (22)

Definition 3.1: Smoothness
Definition 3.2: Strong Convexity
Lemma 3.3: Bounded Staleness
proof
Lemma 3.4: Approximate Descent
proof
Lemma 3.5: Gradient Error
proof
Corollary 3.6: Gradient Bound
Lemma 3.9: One-Step Recursion
...and 12 more

A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

TL;DR

Abstract

A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (22)