Quantile Randomized Kaczmarz Algorithm with Whitelist Trust Mechanism
Sofiia Shvaiko, Longxiu Huang, Elizaveta Rebrova
TL;DR
The paper tackles robustly solving overdetermined linear systems ${\mathbf{A}{\mathbf{x}}^*={\mathbf{b}}}$ when observed labels are corrupted as $\tilde{\mathbf{b}}={\mathbf{b}}+{\boldsymbol{\varepsilon}}$ with $\|{\boldsymbol{\varepsilon}}\|_0\le \beta m$. It reanalyzes QuantileRK (QRK) and introduces WhiteList QuantileRK (WL-QRK), a lightweight online detector with a whitelist/blocklist mechanism that screens rows using two residual thresholds and reintroduces rows when trustworthy, while subsampling residuals to reduce per-iteration cost to $\mathcal{O}(n)+\mathcal{O}(t)$ with $t\ll m$. Theoretical results include a refined QRK convergence rate that improves as $\beta$ decreases and a residual-based identifiability lemma showing that top residuals eventually concentrate on corrupted equations. Empirically, WL-QRK outperforms RK and QRK on synthetic and real imaging data, including tomography and Wisconsin Breast Cancer problems, demonstrating faster convergence and robustness to sparse large-scale corruptions. This work thus offers a practical, scalable approach to robust linear solving in the presence of adversarial row-wise noise.
Abstract
Randomized Kaczmarz (RK) is a simple and fast solver for consistent overdetermined systems, but it is known to be fragile under noise. We study overdetermined $m\times n$ linear systems with a sparse set of corrupted equations, $ {\bf A}{\bf x}^\star = {\bf b}, $where only $\tilde{\bf b} = {\bf b} + \boldsymbol{\varepsilon}$ is observed with $\|\boldsymbol{\varepsilon}\|_0 \le βm$. The recently introduced QuantileRK (QRK) algorithm addresses this issue by testing residuals against a quantile threshold, but computing a per-iteration quantile across many rows is costly. In this work we (i) reanalyze QRK and show that its convergence rate improves monotonically as the corruption fraction $β$ decreases; (ii) propose a simple online detector that flags and removes unreliable rows, which reduces the effective $β$ and speeds up convergence; and (iii) make the method practical by estimating quantiles from a small random subsample of rows, preserving robustness while lowering the per-iteration cost. Simulations on imaging and synthetic data demonstrate the efficiency of the proposed method.
