Table of Contents
Fetching ...

Randomized Kaczmarz with geometrically smoothed momentum

Seth J. Alderman, Roan W. Luikart, Nicholas F. Marshall

TL;DR

The paper introduces the randomized Kaczmarz with geometrically smoothed momentum (KGSM) to solve linear least-squares problems and proves a directional convergence result for the error along singular vectors, extending prior work on directional decay. KGSM updates incorporate momentum that is geometrically smoothed by a parameter $\beta$ and a momentum factor $M$, yielding closed-form behavior for $\mathbb{E}\langle x_{k+1}-x, v_l\rangle$ in terms of $r=1-\sigma_l^2/\|A\|_F^2+M(1-\beta)$ and $\zeta=M(1-\beta)^2$. The authors show that for $M$ and $\beta$ in suitable ranges, KGSM can accelerate convergence in directions associated with small singular values, recover standard Kaczmarz when $M=0$, and optimize smoothing by selecting $\beta$; a rich set of numerical experiments illustrates the dynamics, including complex eigenvalue regimes and spiking behavior. The work motivates further exploration of adaptive parameter selection, extensions to Nesterov-like schemes, and block methods, with implications for faster linear-system solvers and insight into momentum-based stochastic optimization in linear settings.

Abstract

This paper studies the effect of adding geometrically smoothed momentum to the randomized Kaczmarz algorithm, which is an instance of stochastic gradient descent on a linear least squares loss function. We prove a result about the expected error in the direction of singular vectors of the matrix defining the least squares loss. We present several numerical examples illustrating the utility of our result and pose several questions.

Randomized Kaczmarz with geometrically smoothed momentum

TL;DR

The paper introduces the randomized Kaczmarz with geometrically smoothed momentum (KGSM) to solve linear least-squares problems and proves a directional convergence result for the error along singular vectors, extending prior work on directional decay. KGSM updates incorporate momentum that is geometrically smoothed by a parameter and a momentum factor , yielding closed-form behavior for in terms of and . The authors show that for and in suitable ranges, KGSM can accelerate convergence in directions associated with small singular values, recover standard Kaczmarz when , and optimize smoothing by selecting ; a rich set of numerical experiments illustrates the dynamics, including complex eigenvalue regimes and spiking behavior. The work motivates further exploration of adaptive parameter selection, extensions to Nesterov-like schemes, and block methods, with implications for faster linear-system solvers and insight into momentum-based stochastic optimization in linear settings.

Abstract

This paper studies the effect of adding geometrically smoothed momentum to the randomized Kaczmarz algorithm, which is an instance of stochastic gradient descent on a linear least squares loss function. We prove a result about the expected error in the direction of singular vectors of the matrix defining the least squares loss. We present several numerical examples illustrating the utility of our result and pose several questions.
Paper Structure (32 sections, 6 theorems, 92 equations, 16 figures, 1 algorithm)

This paper contains 32 sections, 6 theorems, 92 equations, 16 figures, 1 algorithm.

Key Result

Theorem 1.1

Fix $\beta \in [0,1)$, $M \in [0,1]$, and ${ l} \in \{1,\ldots,n\}$. Suppose that $x_k$ is defined by KGSM eq:our-method2. For all $k \ge 0$ we have where

Figures (16)

  • Figure 1: The error $|\langle x_k - x, v_n \rangle|$ in the direction of the smallest singular vector $v_n$ for randomized Kaczmarz \ref{['kaczmarz']} and KGSM \ref{['eq:our-method']} (see § \ref{['numericscomplex']} for a precise description of this numerical example).
  • Figure 2: The numerical error $|\langle x_k -x,v_{20}\rangle|$ for randomized Kaczmarz \ref{['kaczmarz']} and KGSM \ref{['eq:our-method2']}, and the theoretical estimates for $|\mathbb{E} \langle x_k - x, v_{20} \rangle|$ from \ref{['kaczmarzstef']} and Corollary \ref{['coropt']}, for the example of § \ref{['basicex']}.
  • Figure 3: The numerical error $|\langle x_k -x,v_{20}\rangle|$ for randomized Kaczmarz \ref{['kaczmarz']} and KGSM \ref{['eq:our-method2']}, and the theoretical estimates for $|\mathbb{E} \langle x_k - x, v_{20} \rangle|$ from \ref{['kaczmarzstef']} and Theorem \ref{['thm1']} for the example in § \ref{['numericscomplex']}
  • Figure 4: Visualization of the values of $(M,\beta)$ from \ref{['parameterspace']}. The curve $\beta = 1 - \eta_{20}/(1 - \sqrt{M})^2$ is plotted for reference in blue.
  • Figure 5: The error $|\langle x_k -x,v_{20}\rangle|$ for the randomized Kaczmarz \ref{['kaczmarz']} and KGSM \ref{['eq:our-method2']} for parameters $(M,\beta)$ indicated by markers labeling each plot, which correspond to the markers in Figure \ref{['fig04']}. For corresponding plots of $\ell^2$-norm error $\|x_k - x \|_2$ and $|\langle x_k -x, v_{19} \rangle|$, see § \ref{['additionalbetaplots']}.
  • ...and 11 more figures

Theorems & Definitions (14)

  • Theorem 1.1: Main result
  • Corollary 1.1: $M = 0$
  • Corollary 1.2: Minimizing $\lambda_1$
  • Corollary 1.3
  • Corollary 1.4
  • Remark 1.1: Limitations
  • Remark 2.1: Setting the momentum parameter $M$
  • Remark 2.2: Extending analysis to $\ell^2$-norm error
  • Lemma 3.1
  • proof : Proof of Lemma \ref{['lem1']}
  • ...and 4 more