Table of Contents
Fetching ...

Beyond Expectation: Concentration Inequalities for Randomized Iterative Methods

Toby Anderson, Max Collins, Jamie Haddock, Jackie Lok, Elizaveta Rebrova

TL;DR

The paper addresses the near-worst-case behavior of stochastic iterative methods by deriving concentration and variance bounds for the error, beyond traditional expectation guarantees. It develops a tensor-lifting framework to bound higher-order moments of linear update errors $\mathbf{e}_k = \mathbf{Y}_k \mathbf{e}_{k-1}$, introducing key quantities $\mu = \|\mathbb{E}[ (\mathbf{Y}^T\mathbf{Y})^{\otimes 2}]\|$ and $\eta = \lambda_{\min}(\mathbb{E}[\mathbf{Y}^T\mathbf{Y}])$, which yield $\mathrm{Var}(\|\mathbf{e}_k\|^2) \le (\mu^k - \eta^k) \|\mathbf{e}_0\|^4$ and enable Chebyshev-type concentration and high-probability bounds. Specializing to randomized Kaczmarz and randomized Gauss-Seidel, the authors obtain explicit bounds in terms of singular values of $\mathbf{A}$, and they extend the analysis to nonlinear updates such as RK for linear inequalities. Complementary empirical results illustrate the bounds' behavior under varying conditioning and problem structure. These results provide confidence intervals and trajectory-wide probabilistic guarantees, informing algorithm design and diagnostic use for data-consistency and objective landscape issues in large-scale problems.

Abstract

Stochastic iterative methods are useful in a variety of large-scale numerical linear algebraic, machine learning, and statistical problems, in part due to their low-memory footprint. They are frequently used in a variety of applications, and thus it is imperative to have a thorough theoretical understanding of their behavior. Most theoretical convergence results for stochastic iterative methods provide bounds on the expected error of the iterates, and yield a type of average case analysis. However, understanding the behavior of these methods in the near-worst-case is desirable. For stochastic methods, this motivates providing bounds on the variance and concentration of their error, which can be used to generate confidence intervals around the bounds on their expected error. Here, we provide upper bounds for the concentration and variance of the error of a general class of linear stochastic iterative methods, including the randomized Kaczmarz method and the randomized Gauss--Seidel method, and a more general class of nonlinear stochastic iterative methods, including the randomized Kaczmarz method for systems of linear inequalities.

Beyond Expectation: Concentration Inequalities for Randomized Iterative Methods

TL;DR

The paper addresses the near-worst-case behavior of stochastic iterative methods by deriving concentration and variance bounds for the error, beyond traditional expectation guarantees. It develops a tensor-lifting framework to bound higher-order moments of linear update errors , introducing key quantities and , which yield and enable Chebyshev-type concentration and high-probability bounds. Specializing to randomized Kaczmarz and randomized Gauss-Seidel, the authors obtain explicit bounds in terms of singular values of , and they extend the analysis to nonlinear updates such as RK for linear inequalities. Complementary empirical results illustrate the bounds' behavior under varying conditioning and problem structure. These results provide confidence intervals and trajectory-wide probabilistic guarantees, informing algorithm design and diagnostic use for data-consistency and objective landscape issues in large-scale problems.

Abstract

Stochastic iterative methods are useful in a variety of large-scale numerical linear algebraic, machine learning, and statistical problems, in part due to their low-memory footprint. They are frequently used in a variety of applications, and thus it is imperative to have a thorough theoretical understanding of their behavior. Most theoretical convergence results for stochastic iterative methods provide bounds on the expected error of the iterates, and yield a type of average case analysis. However, understanding the behavior of these methods in the near-worst-case is desirable. For stochastic methods, this motivates providing bounds on the variance and concentration of their error, which can be used to generate confidence intervals around the bounds on their expected error. Here, we provide upper bounds for the concentration and variance of the error of a general class of linear stochastic iterative methods, including the randomized Kaczmarz method and the randomized Gauss--Seidel method, and a more general class of nonlinear stochastic iterative methods, including the randomized Kaczmarz method for systems of linear inequalities.

Paper Structure

This paper contains 17 sections, 5 theorems, 61 equations, 4 figures.

Key Result

Lemma 1.1

Consider a stochastic process $\{ \mathbf{x}_k : k \in \mathbb{N} \}$ approximating an element of nonempty convex $S \subset \mathbb{R}^n$, where $\mathbf{x}_k = f_{i_k}(\mathbf{x}_{k-1})$ and $f_{i_k}$ is independently and randomly selected from a set $F = \{f_1, f_2, \cdots, f_m\}$ at each time $k and $d(\mathbf{x},S) := \inf_{\mathbf{s} \in S} \|\mathbf{x} - \mathbf{s}\|$ is defined with respec

Figures (4)

  • Figure 1: Visualization of errors of 500 independent trials of RK. Empirical mean error (white dashed line), bound \ref{['eq:RKrate']} (black solid line), and the 75% (red dashed lines) and 95% (green dashed lines) confidence intervals for the error derived by combining Chebyshev's inequality with Theorem \ref{['thm:main_linear']} and \ref{['eq:mu_eta_bound']} are plotted.
  • Figure 2: The relationship between $\mu$ and the RK convergence rate, $r = 1 - \frac{\sigma_{\min}^2({\bm{A}})}{\|{\bm{A}}\|_F^2}$. For each cell, we initialized five $m\times n$ row-normalized Gaussian matrices. We compute the $\mu$ parameter as in \ref{['eq:muforRK']} and plot the average value of $\log_r(\mu)$ across the five trials in each cell. When $n > m$, we note that $\log_r(\mu) = 0$.
  • Figure 3: (Left) Visualization of errors of 500 independent trials of RK applied to ${\bm{A}} {\bm{x}} = {\bm{b}}$ where ${\bm{A}} \in \mathbb{R}^{1000 \times 20}$ has singular values given in subfigure captions (pink-blue gradient indicates quantiles of errors). Empirical mean error (white dashed line), bound \ref{['eq:RKrate']} (black solid line), and the 75% (red dashed lines) and 95% (green dashed lines) confidence intervals for the error derived by combining Chebyshev's inequality with Theorem \ref{['thm:main_linear']} and \ref{['eq:mu_eta_bound']} are plotted. (Right) Spectral profile for ${\bm{A}}$.
  • Figure 4: Comparison of the upper bounds on the probability $\mathop{\mathrm{\mathbb{P}}}\nolimits(\|\mathbf{e}_k\|^{2} - \mathbb{E}\|\mathbf{e}_k\|^2 \geq t)$, for various $k$ and $t$, resulting from Theorem \ref{['thm:main_linear']} (top, our contribution), from Lemma \ref{['lem:markov']}\ref{['eq:markov']} (middle, simple Markov inequality bound), and from \ref{['eq:MatrixConc']} which is a consequence of huang2022matrix (bottom, matrix concentration) for RK applied to a various matrices. Each cell corresponds to the upper bound on the probability of the squared-error exceeding its mean by $t$ after $k$ iterations. Darker cells correspond to smaller values, i.e., better concentration bounds.

Theorems & Definitions (15)

  • Lemma 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Theorem 2.1
  • proof
  • proof : Proof of Theorem \ref{['thm:main_linear']}
  • Remark 1
  • Remark 2
  • Remark 3
  • ...and 5 more