Table of Contents
Fetching ...

On the Robustness of the Successive Projection Algorithm

Giovanni Barbarino, Nicolas Gillis

TL;DR

This work analyzes the robustness of the successive projection algorithm (SPA) for separable SSMF under noise, formalizing how the conditioning of the vertex matrix $W$ governs recovery error. It introduces tighter bounds for the first SPA step, extends improved guarantees to the rank-2 and certain translated variants (T-SPA), and proves tightness results for SPA, SPA$^2$, and MVIE-based preconditioning. A novel translation+lifting variant (TL-SPA) is proposed to reduce conditioning and improve practical robustness, with validated gains on synthetic datasets including adversarial middle-point noise and rank-deficient scenarios. Overall, the results provide both theoretical guarantees and practical guidance for selecting SPA variants and preprocessing to reliably recover latent simplex vertices in noisy environments.

Abstract

The successive projection algorithm (SPA) is a workhorse algorithm to learn the $r$ vertices of the convex hull of a set of $(r-1)$-dimensional data points, a.k.a. a latent simplex, which has numerous applications in data science. In this paper, we revisit the robustness to noise of SPA and several of its variants. In particular, when $r \geq 3$, we prove the tightness of the existing error bounds for SPA and for two more robust preconditioned variants of SPA. We also provide significantly improved error bounds for SPA, by a factor proportional to the conditioning of the $r$ vertices, in two special cases: for the first extracted vertex, and when $r \leq 2$. We then provide further improvements for the error bounds of a translated version of SPA proposed by Arora et al. (''A practical algorithm for topic modeling with provable guarantees'', ICML, 2013) in two special cases: for the first two extracted vertices, and when $r \leq 3$. Finally, we propose a new more robust variant of SPA that first shifts and lifts the data points in order to minimize the conditioning of the problem. We illustrate our results on synthetic data.

On the Robustness of the Successive Projection Algorithm

TL;DR

This work analyzes the robustness of the successive projection algorithm (SPA) for separable SSMF under noise, formalizing how the conditioning of the vertex matrix governs recovery error. It introduces tighter bounds for the first SPA step, extends improved guarantees to the rank-2 and certain translated variants (T-SPA), and proves tightness results for SPA, SPA, and MVIE-based preconditioning. A novel translation+lifting variant (TL-SPA) is proposed to reduce conditioning and improve practical robustness, with validated gains on synthetic datasets including adversarial middle-point noise and rank-deficient scenarios. Overall, the results provide both theoretical guarantees and practical guidance for selecting SPA variants and preprocessing to reliably recover latent simplex vertices in noisy environments.

Abstract

The successive projection algorithm (SPA) is a workhorse algorithm to learn the vertices of the convex hull of a set of -dimensional data points, a.k.a. a latent simplex, which has numerous applications in data science. In this paper, we revisit the robustness to noise of SPA and several of its variants. In particular, when , we prove the tightness of the existing error bounds for SPA and for two more robust preconditioned variants of SPA. We also provide significantly improved error bounds for SPA, by a factor proportional to the conditioning of the vertices, in two special cases: for the first extracted vertex, and when . We then provide further improvements for the error bounds of a translated version of SPA proposed by Arora et al. (''A practical algorithm for topic modeling with provable guarantees'', ICML, 2013) in two special cases: for the first two extracted vertices, and when . Finally, we propose a new more robust variant of SPA that first shifts and lifts the data points in order to minimize the conditioning of the problem. We illustrate our results on synthetic data.

Paper Structure

This paper contains 27 sections, 16 theorems, 85 equations, 8 figures, 4 tables, 3 algorithms.

Key Result

Theorem 1

gillis2013fast Let $X = WH + N \in \mathbb{R}^{m \times n}$ where $H \in \mathbb{R}^{r \times n}_+$ is a separable matrix satisfying $H^\top e \leq e$, with $e$ the vector of all one of appropriate dimension, and let where $\sigma_r(W)$ is the $r$th singular value of $W$, and $\mathcal{K}(W) = \frac{K(W)}{\sigma_r(W)}$ with $K(W) = \max_k \|W(:,k)\|$ is a measure of the conditioningNote that $\m

Figures (8)

  • Figure 1: Columns of matrix $X$ as black dots and of matrix $\widetilde{X}$ as green dots.
  • Figure 2: Columns of matrix $\widetilde{X}(\{2,3\},:)$. The points of the dashed red line have the same norm.
  • Figure 3: Columns of matrix $X$ as empty blue circles and columns of matrix $W$ as red dots. The points on the circle (dashed blue line) have the same norm.
  • Figure 4: Columns of the preconditioned matrix $\widetilde{X}$. The points on the circle (dashed blue line) have the same norm.
  • Figure 5: Columns of matrix $X$ as empty blue circles and columns of matrix $W$ as red dots. The dashed blue line is the minimum volume ellipsoid containing $x_1$ and $x_2$. The dashed green line is the minimum volume ellipsoid containing all columns of $X$.
  • ...and 3 more figures

Theorems & Definitions (37)

  • Remark 1: Implementation of SPA
  • Theorem 1
  • Remark 2: Implementation of T-SPA
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • proof : Proof of Theorem \ref{['theo:single_step_SPA']}
  • ...and 27 more