Table of Contents
Fetching ...

RANSAC Revisited: An Improved Algorithm for Robust Subspace Recovery under Adversarial and Noisy Corruptions

Guixian Chen, Jianhao Ma, Salar Fattahi

TL;DR

The paper tackles robust subspace recovery under adversarial contamination and Gaussian noise by reformulating RSR as a two-stage spectral refinement of RANSAC (RANSAC+). The first stage produces a coarse subspace $ extbf{V}$ of dimension $ extbf{hat{r}}=O(r^igstar)$ that nearly contains the true subspace $ extbf{S}^igstar$, without requiring prior knowledge of $r^igstar$, and with near-linear sample complexity in $r^igstar$. The second stage projects data onto $ extbf{V}$ and uses a robustized RANSAC procedure to recover the exact dimension $r^igstar$ and a high-precision subspace, achieving strong robustness to both Gaussian noise and adversarial outliers. The analysis provides formal guarantees on residual bounds, eigen-gap-based recovery, and overall runtime, showing that RANSAC+ is more efficient than classical RANSAC and nearly optimal in sample complexity for noisy adversarial settings. This yields a practically efficient, theoretically sound method for robust subspace recovery in challenging data regimes with unknown subspace dimension.

Abstract

In this paper, we study the problem of robust subspace recovery (RSR) in the presence of both strong adversarial corruptions and Gaussian noise. Specifically, given a limited number of noisy samples -- some of which are tampered by an adaptive and strong adversary -- we aim to recover a low-dimensional subspace that approximately contains a significant fraction of the uncorrupted samples, up to an error that scales with the Gaussian noise. Existing approaches to this problem often suffer from high computational costs or rely on restrictive distributional assumptions, limiting their applicability in truly adversarial settings. To address these challenges, we revisit the classical random sample consensus (RANSAC) algorithm, which offers strong robustness to adversarial outliers, but sacrifices efficiency and robustness against Gaussian noise and model misspecification in the process. We propose a two-stage algorithm, RANSAC+, that precisely pinpoints and remedies the failure modes of standard RANSAC. Our method is provably robust to both Gaussian and adversarial corruptions, achieves near-optimal sample complexity without requiring prior knowledge of the subspace dimension, and is more efficient than existing RANSAC-type methods.

RANSAC Revisited: An Improved Algorithm for Robust Subspace Recovery under Adversarial and Noisy Corruptions

TL;DR

The paper tackles robust subspace recovery under adversarial contamination and Gaussian noise by reformulating RSR as a two-stage spectral refinement of RANSAC (RANSAC+). The first stage produces a coarse subspace of dimension that nearly contains the true subspace , without requiring prior knowledge of , and with near-linear sample complexity in . The second stage projects data onto and uses a robustized RANSAC procedure to recover the exact dimension and a high-precision subspace, achieving strong robustness to both Gaussian noise and adversarial outliers. The analysis provides formal guarantees on residual bounds, eigen-gap-based recovery, and overall runtime, showing that RANSAC+ is more efficient than classical RANSAC and nearly optimal in sample complexity for noisy adversarial settings. This yields a practically efficient, theoretically sound method for robust subspace recovery in challenging data regimes with unknown subspace dimension.

Abstract

In this paper, we study the problem of robust subspace recovery (RSR) in the presence of both strong adversarial corruptions and Gaussian noise. Specifically, given a limited number of noisy samples -- some of which are tampered by an adaptive and strong adversary -- we aim to recover a low-dimensional subspace that approximately contains a significant fraction of the uncorrupted samples, up to an error that scales with the Gaussian noise. Existing approaches to this problem often suffer from high computational costs or rely on restrictive distributional assumptions, limiting their applicability in truly adversarial settings. To address these challenges, we revisit the classical random sample consensus (RANSAC) algorithm, which offers strong robustness to adversarial outliers, but sacrifices efficiency and robustness against Gaussian noise and model misspecification in the process. We propose a two-stage algorithm, RANSAC+, that precisely pinpoints and remedies the failure modes of standard RANSAC. Our method is provably robust to both Gaussian and adversarial corruptions, achieves near-optimal sample complexity without requiring prior knowledge of the subspace dimension, and is more efficient than existing RANSAC-type methods.

Paper Structure

This paper contains 18 sections, 20 theorems, 96 equations, 3 figures, 3 algorithms.

Key Result

Theorem 1

Given an $(\epsilon, \Sigma_\xi)$-corrupted sample set with a sample size $n = \Omega(r^\star \log (r^\star))$ and Gaussian noise covariance satisfying $\sqrt{r^\star \left\lVert\Sigma_\xi\right\rVert} + \sqrt{{\operatorname{tr}}(\Sigma_\xi)} = \mathcal{O}\left(\gamma^\star_{\min}\right)$, there exi

Figures (3)

  • Figure 1: The performance of various RSR methods across different corruption levels $\epsilon$. The considered methods are Tyler's M-estimator (TME) zhang2016robust, Fast Median Subspace (FMS) lerman2018fast, Geodesic Gradient Descent (GGD) maunu2019well, Randomized-Find (RF) hardt2013algorithms, and the classic RANSAC algorithm maunu2019robust. The clean samples are drawn from $N(0, \Sigma^\star)$ with $\mathrm{rank}(\Sigma^\star) = 10$, while outliers are drawn from $N(0, \widehat{\Sigma})$ with $\mathrm{rank}(\widehat{\Sigma}) = 2$. The Gaussian noise covariance is set to zero in these experiments. The subspace spanned by the outliers are chosen to be orthogonal to $\mathcal{S}^\star$. All nonzero eigenvalues of $\Sigma^\star$ are set to $1$, and the nonzero eigenvalues of $\widehat{\Sigma}$ are set to $10$. In all cases, the ambient dimension is set to $d = 100$ and the total sample size to $n=500$.
  • Figure 2: (Left) Performance comparison of RANSAC and RANSAC+ across varying subspace dimensions $r^\star$. The search dimension $r$ for RANSAC is overestimated by one, while RANSAC+ operates without any prior knowledge of $r^\star$. (Middle) Performance of RANSAC and RANSAC+ under Gaussian noise with varying variance $\sigma^2 / d$. (Right) Runtime of RANSAC (with an exact prior knowledge of $r^\star$) and RANSAC+ for different subspace dimensions $r^\star$. The data generation process follows that of Figure \ref{['fig:toy_example']}. In all cases, the total sample size to $n = 500$ and the adversarial corruption parameter to $\epsilon = 0.2$. For the left and middle plots, the ambient dimension is $d=100$, while for the right figure, it is $d=1000$.
  • Figure 3: Overestimation of the subspace dimension, measured by $\hat{r} / r^\star$, for \ref{['alg:first_stage']} under varying corruption rates $\epsilon$ and Gaussian noise levels $\sigma^2$. The experimental setup follows the same example described in \ref{['subsec::failure']}. For each pair of values $(\epsilon, \sigma^2)$, we report the average of $\hat{r} / r^\star$ over 20 independent trials.

Theorems & Definitions (33)

  • Theorem 1
  • Theorem 2
  • Lemma 1: Theorem 6.1 in wainwright2019high
  • Lemma 2: Proposition 2.5 in wainwright2019high
  • Lemma 3: chvatal1979tail
  • Lemma 4: Corollary 2.5 in lecue2017sparse
  • Lemma 5
  • proof
  • Theorem 3
  • Proposition 1
  • ...and 23 more