RANSAC Revisited: An Improved Algorithm for Robust Subspace Recovery under Adversarial and Noisy Corruptions
Guixian Chen, Jianhao Ma, Salar Fattahi
TL;DR
The paper tackles robust subspace recovery under adversarial contamination and Gaussian noise by reformulating RSR as a two-stage spectral refinement of RANSAC (RANSAC+). The first stage produces a coarse subspace $ extbf{V}$ of dimension $ extbf{hat{r}}=O(r^igstar)$ that nearly contains the true subspace $ extbf{S}^igstar$, without requiring prior knowledge of $r^igstar$, and with near-linear sample complexity in $r^igstar$. The second stage projects data onto $ extbf{V}$ and uses a robustized RANSAC procedure to recover the exact dimension $r^igstar$ and a high-precision subspace, achieving strong robustness to both Gaussian noise and adversarial outliers. The analysis provides formal guarantees on residual bounds, eigen-gap-based recovery, and overall runtime, showing that RANSAC+ is more efficient than classical RANSAC and nearly optimal in sample complexity for noisy adversarial settings. This yields a practically efficient, theoretically sound method for robust subspace recovery in challenging data regimes with unknown subspace dimension.
Abstract
In this paper, we study the problem of robust subspace recovery (RSR) in the presence of both strong adversarial corruptions and Gaussian noise. Specifically, given a limited number of noisy samples -- some of which are tampered by an adaptive and strong adversary -- we aim to recover a low-dimensional subspace that approximately contains a significant fraction of the uncorrupted samples, up to an error that scales with the Gaussian noise. Existing approaches to this problem often suffer from high computational costs or rely on restrictive distributional assumptions, limiting their applicability in truly adversarial settings. To address these challenges, we revisit the classical random sample consensus (RANSAC) algorithm, which offers strong robustness to adversarial outliers, but sacrifices efficiency and robustness against Gaussian noise and model misspecification in the process. We propose a two-stage algorithm, RANSAC+, that precisely pinpoints and remedies the failure modes of standard RANSAC. Our method is provably robust to both Gaussian and adversarial corruptions, achieves near-optimal sample complexity without requiring prior knowledge of the subspace dimension, and is more efficient than existing RANSAC-type methods.
