A Short Survey of Averaging Techniques in Stochastic Gradient Methods

K. Lakshmanan

A Short Survey of Averaging Techniques in Stochastic Gradient Methods

K. Lakshmanan

TL;DR

A survey of averaging techniques in stochastic gradient optimization, which reviews the theoretical foundations of averaged stochastic approximation, discusses modern developments in stochastic gradient methods, and examines applications of averaging in machine learning.

Abstract

Stochastic gradient methods are among the most widely used algorithms for large-scale optimization and machine learning. A key technique for improving the statistical efficiency and stability of these methods is the use of averaging schemes applied to the sequence of iterates generated during optimization. Starting from the classical work on stochastic approximation, averaging techniques such as Polyak--Ruppert averaging have been shown to achieve optimal asymptotic variance and improved convergence behavior. In recent years, averaging methods have gained renewed attention in machine learning applications, particularly in the training of deep neural networks and large-scale learning systems. Techniques such as tail averaging, exponential moving averages, and stochastic weight averaging have demonstrated strong empirical performance and improved generalization properties. This paper provides a survey of averaging techniques in stochastic gradient optimization. We review the theoretical foundations of averaged stochastic approximation, discuss modern developments in stochastic gradient methods, and examine applications of averaging in machine learning. In addition, we summarize recent results on the finite-sample behavior of averaging schemes and highlight several open problems and directions for future research.

A Short Survey of Averaging Techniques in Stochastic Gradient Methods

TL;DR

Abstract

Paper Structure (40 sections, 18 equations, 4 figures, 2 tables)

This paper contains 40 sections, 18 equations, 4 figures, 2 tables.

Introduction
Background: Stochastic Approximation and Gradient Methods
Stochastic Approximation
Stochastic Gradient Descent
Convergence Properties
Polyak--Ruppert Averaging
The Averaged Stochastic Gradient Algorithm
Asymptotic Optimality
Historical Development
Interpretation as Variance Reduction
Limitations and Practical Considerations
Tail Averaging and Window Averaging
Tail Averaging
Window Averaging
Weighted Averaging Schemes
...and 25 more sections

Figures (4)

Figure 1: Main categories of averaging techniques in stochastic optimization.
Figure 2: Historical development of averaging techniques in stochastic optimization.
Figure 3: Stochastic gradient iterates exhibit noisy behavior, while averaged iterates provide a smoother trajectory.
Figure 4: Stochastic optimization typically exhibits a transient phase followed by a stationary phase where iterates fluctuate near the optimum.

A Short Survey of Averaging Techniques in Stochastic Gradient Methods

TL;DR

Abstract

A Short Survey of Averaging Techniques in Stochastic Gradient Methods

Authors

TL;DR

Abstract

Table of Contents

Figures (4)