Table of Contents
Fetching ...

Read Between the Hyperplanes: On Spectral Projection and Sampling Approaches to Randomized Kaczmarz

James Nguyen, Oleg Presnyakov, Adityakrishnan Radhakhrishnan

TL;DR

The paper tackles accelerating randomized Kaczmarz methods for ill-conditioned, overdetermined linear systems by three complementary approaches: (i) directionally aware projections using pairwise row differences, (ii) core-set construction via clustering to reduce problem size while preserving subspace geometry, and (iii) spectral-direction aware sampling that increases the likelihood of selecting rows aligned with underrepresented singular directions. Empirical results show that pairwise-difference augmentation can reduce approximation and Chebyshev errors, coreset-based clustering can preserve left-subspace geometry with smaller row subsets, and spectral weighting can significantly speed convergence toward the least represented singular direction, albeit requiring knowledge of the singular vectors. The findings highlight trade-offs between convergence speed and accuracy across methods, with Hadamard SKM often performing best on severe ill-conditioning. The work points to practical pathways for scalable, robust RK/SKM in large-scale settings and motivates SVD-free approximations of spectral information for real-time adaptation.

Abstract

Among recent developments centered around Randomized Kaczmarz (RK), a row-sampling iterative projection method for large-scale linear systems, several adaptions to the method have inspired faster convergence. Focusing solely on ill-conditioned and overdetermined linear systems, we highlight inter-row relationships that can be leveraged to guide directionally aware projections. In particular, we find that improved convergence rates can be made by (i) projecting onto pairwise row differences, (ii) sampling from partitioned clusters of nearly orthogonal rows, or (iii) more frequently sampling spectrally-diverse rows.

Read Between the Hyperplanes: On Spectral Projection and Sampling Approaches to Randomized Kaczmarz

TL;DR

The paper tackles accelerating randomized Kaczmarz methods for ill-conditioned, overdetermined linear systems by three complementary approaches: (i) directionally aware projections using pairwise row differences, (ii) core-set construction via clustering to reduce problem size while preserving subspace geometry, and (iii) spectral-direction aware sampling that increases the likelihood of selecting rows aligned with underrepresented singular directions. Empirical results show that pairwise-difference augmentation can reduce approximation and Chebyshev errors, coreset-based clustering can preserve left-subspace geometry with smaller row subsets, and spectral weighting can significantly speed convergence toward the least represented singular direction, albeit requiring knowledge of the singular vectors. The findings highlight trade-offs between convergence speed and accuracy across methods, with Hadamard SKM often performing best on severe ill-conditioning. The work points to practical pathways for scalable, robust RK/SKM in large-scale settings and motivates SVD-free approximations of spectral information for real-time adaptation.

Abstract

Among recent developments centered around Randomized Kaczmarz (RK), a row-sampling iterative projection method for large-scale linear systems, several adaptions to the method have inspired faster convergence. Focusing solely on ill-conditioned and overdetermined linear systems, we highlight inter-row relationships that can be leveraged to guide directionally aware projections. In particular, we find that improved convergence rates can be made by (i) projecting onto pairwise row differences, (ii) sampling from partitioned clusters of nearly orthogonal rows, or (iii) more frequently sampling spectrally-diverse rows.

Paper Structure

This paper contains 16 sections, 14 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: Quantitative results for $A' \in \mathbb R ^{240\times 12}$, averaged over 100 trials: (i) Iteration vs. Average Approximation Error, (ii) Iteration vs. Average Chebyshev Error, and (iii) Iteration vs. Average Accuracy (left to right). All results obtained by applying SKM using $\beta = 3$, with no over-projection parameter $(\lambda = 1)$. The above plots feature metrics associated with the sampling schemes 1, 2, and 3, referenced by $A', A'\cup P,$ and $P$, respectively.
  • Figure 1: Comparison of coreset performance under varying condition numbers. (Left) Relative solution error between the full least-squares solution $x_A$ and the coreset solution $x_B$ as a function of the coreset factor $c$. Increasing $c$ consistently improves the approximation quality, even for ill-conditioned systems ($\kappa(A)=10^7$). (Right) Residual norms for the full and coreset systems plotted on a logarithmic scale. Dashed lines denote residuals $\|A x_B - b\|$, while solid lines correspond to $\|B x_B - b_B\|$. The stability of residual magnitudes across condition numbers indicates that the reduced system preserves the dominant geometry of the original problem.
  • Figure 1: $A \in \mathbb R^{240 \times 12}$ is constructed as described in Section \ref{['sub:numericalExperimentation']}. The figure demonstrates the convergence of $|\langle x_k-x^*, v_j\rangle |$ for each singular vector $v_j ~(j = 1,\dots, n)$ at each iteration $k$. It demonstrates the ordinal spectrum of convergence by decreasing singular value.
  • Figure 2: Convergence of SKM variants on an ill-conditioned linear system with $(m,n) = (2000,20)$ and condition number $\kappa(A)\approx 10^{7}$ over $2000$ iterations. The blue curve corresponds to the original Hadamard SKM, the orange curve to the reduced-matrix (proof-of-concept) variant using a coreset of size $c=5$, the green curve to the clustering-based reduction ("$\varepsilon$-cover") approach, and the red curve to the online clustering / linear-dependence reduction scheme. The figure illustrates that the reduced-matrix variant closely tracks the baseline, while clustering-based variants exhibit slower convergence and larger approximation error.
  • Figure 2: A matrix $A \in \mathbb R^{240 \times 12}$ is constructed as described in Section \ref{['sub:numericalExperimentation']}. Plot data averages results over 50 different random matrices. The left plot describes the directional convergence $|\langle x_k-x^*, v_n\rangle|$ for each algorithm. The right plot demonstrates the improved convergence rate for weighted sampling.