Randomized Kaczmarz with geometrically smoothed momentum

Seth J. Alderman; Roan W. Luikart; Nicholas F. Marshall

Randomized Kaczmarz with geometrically smoothed momentum

Seth J. Alderman, Roan W. Luikart, Nicholas F. Marshall

TL;DR

The paper introduces the randomized Kaczmarz with geometrically smoothed momentum (KGSM) to solve linear least-squares problems and proves a directional convergence result for the error along singular vectors, extending prior work on directional decay. KGSM updates incorporate momentum that is geometrically smoothed by a parameter $\beta$ and a momentum factor $M$, yielding closed-form behavior for $\mathbb{E}\langle x_{k+1}-x, v_l\rangle$ in terms of $r=1-\sigma_l^2/\|A\|_F^2+M(1-\beta)$ and $\zeta=M(1-\beta)^2$. The authors show that for $M$ and $\beta$ in suitable ranges, KGSM can accelerate convergence in directions associated with small singular values, recover standard Kaczmarz when $M=0$, and optimize smoothing by selecting $\beta$; a rich set of numerical experiments illustrates the dynamics, including complex eigenvalue regimes and spiking behavior. The work motivates further exploration of adaptive parameter selection, extensions to Nesterov-like schemes, and block methods, with implications for faster linear-system solvers and insight into momentum-based stochastic optimization in linear settings.

Abstract

This paper studies the effect of adding geometrically smoothed momentum to the randomized Kaczmarz algorithm, which is an instance of stochastic gradient descent on a linear least squares loss function. We prove a result about the expected error in the direction of singular vectors of the matrix defining the least squares loss. We present several numerical examples illustrating the utility of our result and pose several questions.

Randomized Kaczmarz with geometrically smoothed momentum

TL;DR

and a momentum factor

, yielding closed-form behavior for

in terms of

and

. The authors show that for

and

in suitable ranges, KGSM can accelerate convergence in directions associated with small singular values, recover standard Kaczmarz when

, and optimize smoothing by selecting

; a rich set of numerical experiments illustrates the dynamics, including complex eigenvalue regimes and spiking behavior. The work motivates further exploration of adaptive parameter selection, extensions to Nesterov-like schemes, and block methods, with implications for faster linear-system solvers and insight into momentum-based stochastic optimization in linear settings.

Abstract

Paper Structure (32 sections, 6 theorems, 92 equations, 16 figures, 1 algorithm)

This paper contains 32 sections, 6 theorems, 92 equations, 16 figures, 1 algorithm.

Introduction
Introduction
Motivation
Related work
Main contributions
Main result
Numerical Examples
Notation and preliminaries
Basic Example
Complex pertubation
Exploring the $(M,\beta)$ parameter space
Periodic Spiking behavior
Linear distribution of singular values
Many small singular values
Comparison to $\ell^2$-norm error
...and 17 more sections

Key Result

Theorem 1.1

Fix $\beta \in [0,1)$, $M \in [0,1]$, and ${ l} \in \{1,\ldots,n\}$. Suppose that $x_k$ is defined by KGSM eq:our-method2. For all $k \ge 0$ we have where

Figures (16)

Figure 1: The error $|\langle x_k - x, v_n \rangle|$ in the direction of the smallest singular vector $v_n$ for randomized Kaczmarz \ref{['kaczmarz']} and KGSM \ref{['eq:our-method']} (see § \ref{['numericscomplex']} for a precise description of this numerical example).
Figure 2: The numerical error $|\langle x_k -x,v_{20}\rangle|$ for randomized Kaczmarz \ref{['kaczmarz']} and KGSM \ref{['eq:our-method2']}, and the theoretical estimates for $|\mathbb{E} \langle x_k - x, v_{20} \rangle|$ from \ref{['kaczmarzstef']} and Corollary \ref{['coropt']}, for the example of § \ref{['basicex']}.
Figure 3: The numerical error $|\langle x_k -x,v_{20}\rangle|$ for randomized Kaczmarz \ref{['kaczmarz']} and KGSM \ref{['eq:our-method2']}, and the theoretical estimates for $|\mathbb{E} \langle x_k - x, v_{20} \rangle|$ from \ref{['kaczmarzstef']} and Theorem \ref{['thm1']} for the example in § \ref{['numericscomplex']}
Figure 4: Visualization of the values of $(M,\beta)$ from \ref{['parameterspace']}. The curve $\beta = 1 - \eta_{20}/(1 - \sqrt{M})^2$ is plotted for reference in blue.
Figure 5: The error $|\langle x_k -x,v_{20}\rangle|$ for the randomized Kaczmarz \ref{['kaczmarz']} and KGSM \ref{['eq:our-method2']} for parameters $(M,\beta)$ indicated by markers labeling each plot, which correspond to the markers in Figure \ref{['fig04']}. For corresponding plots of $\ell^2$-norm error $\|x_k - x \|_2$ and $|\langle x_k -x, v_{19} \rangle|$, see § \ref{['additionalbetaplots']}.
...and 11 more figures

Theorems & Definitions (14)

Theorem 1.1: Main result
Corollary 1.1: $M = 0$
Corollary 1.2: Minimizing $\lambda_1$
Corollary 1.3
Corollary 1.4
Remark 1.1: Limitations
Remark 2.1: Setting the momentum parameter $M$
Remark 2.2: Extending analysis to $\ell^2$-norm error
Lemma 3.1
proof : Proof of Lemma \ref{['lem1']}
...and 4 more

Randomized Kaczmarz with geometrically smoothed momentum

TL;DR

Abstract

Randomized Kaczmarz with geometrically smoothed momentum

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (14)