Robust Randomized Low-Rank Approximation with Row-Wise Outlier Detection
Aidan Tiruvan
TL;DR
This work tackles robust low-rank approximation when a fraction of rows can be adversarially corrupted by presenting a row-wise outlier model. It introduces a one-pass, scalable algorithm that first sketches the data with a Johnson–Lindenstrauss projection and then uses median/MAD-based thresholding to remove outlier rows before performing a randomized SVD on the clean subset. Theoretical guarantees show that, with high probability, the recovered rank-$k$ approximation closely matches the best low-rank approximation of the clean data, plus a controllable additive error, while empirical results demonstrate strong outlier detection and substantial speedups over robust baselines. The approach offers a practical, parallelizable alternative to convex or iterative robust PCA methods for large-scale datasets with row-wise corruption.
Abstract
Robust low-rank approximation under row-wise adversarial corruption can be achieved with a single pass, randomized procedure that detects and removes outlier rows by thresholding their projected norms. We propose a scalable, non-iterative algorithm that efficiently recovers the underlying low-rank structure in the presence of row-wise adversarial corruption. By first compressing the data with a Johnson Lindenstrauss projection, our approach preserves the geometry of clean rows while dramatically reducing dimensionality. Robust statistical techniques based on the median and median absolute deviation then enable precise identification and removal of outlier rows with abnormally high norms. The subsequent rank-k approximation achieves near-optimal error bounds with a one pass procedure that scales linearly with the number of observations. Empirical results confirm that combining random sketches with robust statistics yields efficient, accurate decompositions even in the presence of large fractions of corrupted rows.
