Table of Contents
Fetching ...

Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery

Gilad Lerman, Kang Li, Tyler Maunu, Teng Zhang

TL;DR

These results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.

Abstract

Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain poorly understood. This paper establishes that, under deterministic conditions, a variant of IRLS with dynamic smoothing regularization converges linearly to the underlying subspace from any initialization. We extend these guarantees to affine subspace estimation, a setting that lacks prior recovery theory. Additionally, we illustrate the practical benefits of IRLS through an application to low-dimensional neural network training. Our results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.

Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery

TL;DR

These results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.

Abstract

Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain poorly understood. This paper establishes that, under deterministic conditions, a variant of IRLS with dynamic smoothing regularization converges linearly to the underlying subspace from any initialization. We extend these guarantees to affine subspace estimation, a setting that lacks prior recovery theory. Additionally, we illustrate the practical benefits of IRLS through an application to low-dimensional neural network training. Our results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.

Paper Structure

This paper contains 36 sections, 19 theorems, 197 equations, 6 figures, 1 algorithm.

Key Result

Theorem 1

[Global Linear Convergence of FMS-DS] Let $\mathcal{X}$ be a dataset satisfying Assumptions assump:lowerdim, assump:global, and assump:global2 for the given constant $\gamma>0$. Then, the sequence $L^{(k)}$ generated by Algorithm alg:IRLS with dynamic smoothing hyperparameter $\gamma$ converges to $

Figures (6)

  • Figure 1: Performance comparison of PCA, TME, RANSAC, STE, and FMS on synthetic data, reporting the geometric mean of the error over 200 repetitions on a log scale. This corresponds to the averaged log-error. We vary outlier dimension across rows and inlier dimension across columns as follows: top row: outlier dimension $d_{\mathrm{out}} = 1$, middle rows: $d_{\mathrm{out}} = 5,10$, bottom row: $d_{\mathrm{out}} = 50$. Left column: inlier dimension $d = 3$, middle column: $d = 10$, right column: $d=50$. As we can see, FMS-DS with small $\gamma$ performs well across a range of settings. RANSAC performs well for small $d$, but fails for larger $d$ due to the fact that its runtime is exponential in $d$, and we cap the number of iterations.
  • Figure 2: Performance comparison of PCA, TME, RANSAC, STE, and FMS on synthetic data, reporting the geometric mean of the error over 200 repetitions on a log scale. This corresponds to the averaged log-error. The inlier and outlier dimensions are fixed at $d = 50$ and $d_{\mathrm{out}} = 50$, while the number of samples $n$ varies: left: $n = 200$, middle: $n = 300$, right: $n = 400$. As $n$ increases, the performance of FMS and FMS-DS improves substantially and becomes comparable to STE and TME, whereas RANSAC performs poorly across all values of $n$.
  • Figure 3: Averaged log-error versus outlier percentage for the experiment with adversarial initialization. Here, the inlier subspace dimension is 3 and the outlier subspace is a 1-dimensional subspace orthogonal to it. FMS is initialized with two directions within the inlier subspace and one direction orthogonal to it.
  • Figure 4: Failure rate of the various FMS algorithms under the same setting as Figure \ref{['fig:exp2']}, for different outlier percentages. Left: full range of outlier percentages. Right: zoomed-in view.
  • Figure 5: Averaged log-error versus iteration for FMS with different regularization strategies. On the left, we demonstrate convergence with PCA initialization as in Experiment 1, and on the right, we demonstrate convergence with the orthogonal initialization of Experiment 2. The left plot demonstrates that in settings with good initialization and lack of bad stationary points, FMS with small fixed $\epsilon=10^{-15}$ and FMS-DS with small $\gamma=0.1$ both perform well. On the other hand, for bad initialization, we see that FMS with larger $\epsilon$ and FMS-DS with larger $\gamma$ escape the saddle point faster.
  • ...and 1 more figures

Theorems & Definitions (40)

  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 1
  • proof : Proof of Theorem \ref{['thm:global']}
  • Lemma 1: Properties of smoothed objective function
  • Lemma 2: Decrease over iterations
  • Lemma 3: Nonzero gradient, local guarantee
  • Lemma 4: $L^{(k)}$ is in a local neighborhood of $L_\star$
  • Lemma 5: Bound on the objective value
  • ...and 30 more