Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing
Gil Goldshlager, Jiang Hu, Lin Lin
TL;DR
The paper tackles solving large-scale linear least-squares without preprocessing by analyzing RBK under uniform sampling and introducing ReBlocK, a regularized RBK variant. It shows RBK-U converges in a Monte Carlo sense to a weighted LS solution but can fail when the data contain nearly singular blocks, and it demonstrates that incorporating a mild regularization yields robust convergence with controllable bias and variance. Gaussian-data analysis provides conditions under which RBK-U recovers $x^*$ with faster rates than mSGD for rapidly decaying spectra, while ReBlocK offers practical robustness and efficiency, including favorable natural-gradient applications. Empirical results illustrate that ReBlocK-U outperforms both RBK-U and mSGD in inconsistent problems, and tail averaging further enhances convergence, supporting a no-preprocessing, Monte Carlo-based approach to large-scale LS and relevant neural/NLP and physics-inspired tasks.
Abstract
Due to the ever growing amounts of data leveraged for machine learning and scientific computing, it is increasingly important to develop algorithms that sample only a small portion of the data at a time. In the case of linear least-squares, the randomized block Kaczmarz method (RBK) is an appealing example of such an algorithm, but its convergence is only understood under sampling distributions that require potentially prohibitively expensive preprocessing steps. To address this limitation, we analyze RBK when the data is sampled uniformly, showing that its iterates converge in a Monte Carlo sense to a $\textit{weighted}$ least-squares solution. Unfortunately, for general problems the bias of the weighted least-squares solution and the variance of the iterates can become arbitrarily large. We show that these quantities can be rigorously controlled by incorporating regularization into the RBK iterations, yielding the regularized algorithm ReBlocK. Numerical experiments including examples arising from natural gradient optimization demonstrate that ReBlocK can outperform both RBK and minibatch stochastic gradient descent for inconsistent problems with rapidly decaying singular values.
