Table of Contents
Fetching ...

Robust Randomized Low-Rank Approximation with Row-Wise Outlier Detection

Aidan Tiruvan

TL;DR

This work tackles robust low-rank approximation when a fraction of rows can be adversarially corrupted by presenting a row-wise outlier model. It introduces a one-pass, scalable algorithm that first sketches the data with a Johnson–Lindenstrauss projection and then uses median/MAD-based thresholding to remove outlier rows before performing a randomized SVD on the clean subset. Theoretical guarantees show that, with high probability, the recovered rank-$k$ approximation closely matches the best low-rank approximation of the clean data, plus a controllable additive error, while empirical results demonstrate strong outlier detection and substantial speedups over robust baselines. The approach offers a practical, parallelizable alternative to convex or iterative robust PCA methods for large-scale datasets with row-wise corruption.

Abstract

Robust low-rank approximation under row-wise adversarial corruption can be achieved with a single pass, randomized procedure that detects and removes outlier rows by thresholding their projected norms. We propose a scalable, non-iterative algorithm that efficiently recovers the underlying low-rank structure in the presence of row-wise adversarial corruption. By first compressing the data with a Johnson Lindenstrauss projection, our approach preserves the geometry of clean rows while dramatically reducing dimensionality. Robust statistical techniques based on the median and median absolute deviation then enable precise identification and removal of outlier rows with abnormally high norms. The subsequent rank-k approximation achieves near-optimal error bounds with a one pass procedure that scales linearly with the number of observations. Empirical results confirm that combining random sketches with robust statistics yields efficient, accurate decompositions even in the presence of large fractions of corrupted rows.

Robust Randomized Low-Rank Approximation with Row-Wise Outlier Detection

TL;DR

This work tackles robust low-rank approximation when a fraction of rows can be adversarially corrupted by presenting a row-wise outlier model. It introduces a one-pass, scalable algorithm that first sketches the data with a Johnson–Lindenstrauss projection and then uses median/MAD-based thresholding to remove outlier rows before performing a randomized SVD on the clean subset. Theoretical guarantees show that, with high probability, the recovered rank- approximation closely matches the best low-rank approximation of the clean data, plus a controllable additive error, while empirical results demonstrate strong outlier detection and substantial speedups over robust baselines. The approach offers a practical, parallelizable alternative to convex or iterative robust PCA methods for large-scale datasets with row-wise corruption.

Abstract

Robust low-rank approximation under row-wise adversarial corruption can be achieved with a single pass, randomized procedure that detects and removes outlier rows by thresholding their projected norms. We propose a scalable, non-iterative algorithm that efficiently recovers the underlying low-rank structure in the presence of row-wise adversarial corruption. By first compressing the data with a Johnson Lindenstrauss projection, our approach preserves the geometry of clean rows while dramatically reducing dimensionality. Robust statistical techniques based on the median and median absolute deviation then enable precise identification and removal of outlier rows with abnormally high norms. The subsequent rank-k approximation achieves near-optimal error bounds with a one pass procedure that scales linearly with the number of observations. Empirical results confirm that combining random sketches with robust statistics yields efficient, accurate decompositions even in the presence of large fractions of corrupted rows.

Paper Structure

This paper contains 30 sections, 5 theorems, 76 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Lemma 3.1

Let $B \in \mathbb{R}^{m \times n}$ be an approximately rank-$k$ matrix, and let $N \in \mathbb{R}^{m \times n}$ satisfy $\|N_{i,:}\|_2 \le \delta$ for all $i \in S_{\mathrm{clean}}$. Define $A = B + N$. Suppose $\|B_{i,:}\|_2 \ge \kappa\,\delta$ for some $\kappa > 1$. Let $\Psi \in \mathbb{R}^{n \t Then, with probability at least $1 - \delta'$, every clean row $i \in S_{\mathrm{clean}}$ satisfies

Figures (9)

  • Figure 1: Histogram of JL-projected row norms for a sample synthetic dataset (e.g., $\alpha=0.2$ and $\text{outlier\_scale} = 5.0$), with the vertical dashed line indicating the threshold $\tau$. Rows to the right of $\tau$ are discarded as outliers.
  • Figure 2: Precision and recall as functions of the outlier fraction $\alpha$, for different values of $c$ and $\text{outlier\_scale}$. A significant drop in recall is observed for higher $\alpha$ with smaller norm gaps.
  • Figure 3: Effect of outlier fraction $\alpha$ on the relative Frobenius error (inliers only), for various $\text{outlier\_scale}$ and threshold constant $c$. When $\alpha$ is large or the scale is small, errors can increase.
  • Figure 4: Average runtime vs. outlier fraction $\alpha$, for different threshold constants $c$ and outlier_scale values. Our approach remains under 0.4s in most scenarios for $m = 1000, n = 500$.
  • Figure 5: Subspace error (largest principal angle) vs. outlier fraction $\alpha$ for repeated trials, illustrating how robust LRA maintains a lower subspace error than standard PCA or other baselines.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Lemma 3.1: Row-Wise Concentration
  • proof
  • Lemma 3.2: Outlier Detection Guarantee
  • proof
  • Theorem 4.1: Robust Low-Rank Approximation Guarantee
  • Remark 4.1: Practical Parameter Guidance
  • proof : Proof of Theorem \ref{['thm:main']}
  • Lemma A.1: Lemma 3.1 Restated
  • proof
  • Theorem A.1: Theorem 4.1 Restated
  • ...and 1 more