Table of Contents
Fetching ...

Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams

Sajjad Hashemian, Mohammad Saeed Arvenaghi, Ebrahim Ardeshir-Larijani

TL;DR

This work addresses robust PCA in the presence of outliers by recasting the problem through higher-degree Voronoi diagrams to partition the subspace search space and identify the optimal $r$-dimensional subspace. The authors present an exact algorithm with worst-case time $n^{d+\mathcal{O}(1)} \cdot \text{poly}(n,d)$ and a randomized method with time $2^{\mathcal{O}(r(d-r))} \cdot \text{poly}(n,d)$ that leverages Grassmannian sampling and an $\alpha$-gap separation to guarantee high-probability recovery of the correct subspace. The approach provides a clearer, geometry-driven framework for outlier-robust PCA and offers practical scalability for high-dimensional data, along with theoretical optimality bounds under standard complexity assumptions. Potential extensions include improved sampling strategies and dual geometric constructions such as Delaunay triangulations and online variants of PCA.

Abstract

In this paper, we introduce new algorithms for Principal Component Analysis (PCA) with outliers. Utilizing techniques from computational geometry, specifically higher-degree Voronoi diagrams, we navigate to the optimal subspace for PCA even in the presence of outliers. This approach achieves an optimal solution with a time complexity of $n^{d+\mathcal{O}(1)}\text{poly}(n,d)$. Additionally, we present a randomized algorithm with a complexity of $2^{\mathcal{O}(r(d-r))} \times \text{poly}(n, d)$. This algorithm samples subspaces characterized in terms of a Grassmannian manifold. By employing such sampling method, we ensure a high likelihood of capturing the optimal subspace, with the success probability $(1 - δ)^T$. Where $δ$ represents the probability that a sampled subspace does not contain the optimal solution, and $T$ is the number of subspaces sampled, proportional to $2^{r(d-r)}$. Our use of higher-degree Voronoi diagrams and Grassmannian based sampling offers a clearer conceptual pathway and practical advantages, particularly in handling large datasets or higher-dimensional settings.

Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams

TL;DR

This work addresses robust PCA in the presence of outliers by recasting the problem through higher-degree Voronoi diagrams to partition the subspace search space and identify the optimal -dimensional subspace. The authors present an exact algorithm with worst-case time and a randomized method with time that leverages Grassmannian sampling and an -gap separation to guarantee high-probability recovery of the correct subspace. The approach provides a clearer, geometry-driven framework for outlier-robust PCA and offers practical scalability for high-dimensional data, along with theoretical optimality bounds under standard complexity assumptions. Potential extensions include improved sampling strategies and dual geometric constructions such as Delaunay triangulations and online variants of PCA.

Abstract

In this paper, we introduce new algorithms for Principal Component Analysis (PCA) with outliers. Utilizing techniques from computational geometry, specifically higher-degree Voronoi diagrams, we navigate to the optimal subspace for PCA even in the presence of outliers. This approach achieves an optimal solution with a time complexity of . Additionally, we present a randomized algorithm with a complexity of . This algorithm samples subspaces characterized in terms of a Grassmannian manifold. By employing such sampling method, we ensure a high likelihood of capturing the optimal subspace, with the success probability . Where represents the probability that a sampled subspace does not contain the optimal solution, and is the number of subspaces sampled, proportional to . Our use of higher-degree Voronoi diagrams and Grassmannian based sampling offers a clearer conceptual pathway and practical advantages, particularly in handling large datasets or higher-dimensional settings.
Paper Structure (7 sections, 12 theorems, 36 equations, 2 figures, 2 algorithms)

This paper contains 7 sections, 12 theorems, 36 equations, 2 figures, 2 algorithms.

Key Result

Proposition 1

(basu_algorithms_2003, Theorem 13.22) Consider $V$ as an algebraic set in $\mathbb{R}^d$ of real dimension $d'$, defined by $Q(X_1, \ldots, X_d) = 0$, where $Q$ is a polynomial in $\mathbb{R}[X_1, \ldots, X_d]$ of degree at most $b$. Let $\mathcal{P} \subset \mathbb{R}[X_1, \ldots, X_d]$ be a finite

Figures (2)

  • Figure 1: A Voronoi diagram with 15 data points that partition the 2D plane. The query point labeled with $P$ is closer to the point $X_9$ than any other points in the dataset, a property that all points surrounding the region of $X_9$ share as well.
  • Figure 2: 2nd-degree Voronoi diagram of the furthest subspace for 8 points in $\mathbb{R}^2$ space which can be used to solve PCA with outliers with goal dimension 1 and 2 outliers. The projection of subspaces for each Voronoi cell of the furthest subspace is shown on the unit circle (smaller circle) and the image of subspaces for the Voronoi cell of the second furthest subspace is shown on the circle with radius 2 (larger circle). Space coloring indicates different 2nd-degree cells and subspaces of the same color indicate the same cell. For example, in the above figure, 7 cells are specified, considering that 6 of these eight points belong to exactly one subspace, the number of possible cells is far less than all the possible combinations as outlier data.

Theorems & Definitions (25)

  • Proposition 1
  • Proposition 2
  • proof
  • Proposition 3
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Proposition 4
  • Theorem 1
  • ...and 15 more