Table of Contents
Fetching ...

Heavy-Tailed Principle Component Analysis

Mario Sayde, Christopher Khater, Jihad Fahs, Ibrahim Abou-Faycal

Abstract

Principal Component Analysis (PCA) is a cornerstone of dimensionality reduction, yet its classical formulation relies critically on second-order moments and is therefore fragile in the presence of heavy-tailed data and impulsive noise. While numerous robust PCA variants have been proposed, most either assume finite variance, rely on sparsity-driven decompositions, or address robustness through surrogate loss functions without a unified treatment of infinite-variance models. In this paper, we study PCA for high-dimensional data generated according to a superstatistical dependent model of the form $\mathbf{X} = A^{1/2}\mathbf{G}$, where $A$ is a positive random scalar and $\mathbf{G}$ is a Gaussian vector. This framework captures a wide class of heavy-tailed distributions, including multivariate $t$ and sub-Gaussian $α$-stable laws. We formulate PCA under a logarithmic loss, which remains well defined even when moments do not exist. Our main theoretical result shows that, under this loss, the principal components of the heavy-tailed observations coincide with those obtained by applying standard PCA to the covariance matrix of the underlying Gaussian generator. Building on this insight, we propose robust estimators for this covariance matrix directly from heavy-tailed data and compare them with the empirical covariance and Tyler's scatter estimator. Extensive experiments, including background denoising tasks, demonstrate that the proposed approach reliably recovers principal directions and significantly outperforms classical PCA in the presence of heavy-tailed and impulsive noise, while remaining competitive under Gaussian noise.

Heavy-Tailed Principle Component Analysis

Abstract

Principal Component Analysis (PCA) is a cornerstone of dimensionality reduction, yet its classical formulation relies critically on second-order moments and is therefore fragile in the presence of heavy-tailed data and impulsive noise. While numerous robust PCA variants have been proposed, most either assume finite variance, rely on sparsity-driven decompositions, or address robustness through surrogate loss functions without a unified treatment of infinite-variance models. In this paper, we study PCA for high-dimensional data generated according to a superstatistical dependent model of the form , where is a positive random scalar and is a Gaussian vector. This framework captures a wide class of heavy-tailed distributions, including multivariate and sub-Gaussian -stable laws. We formulate PCA under a logarithmic loss, which remains well defined even when moments do not exist. Our main theoretical result shows that, under this loss, the principal components of the heavy-tailed observations coincide with those obtained by applying standard PCA to the covariance matrix of the underlying Gaussian generator. Building on this insight, we propose robust estimators for this covariance matrix directly from heavy-tailed data and compare them with the empirical covariance and Tyler's scatter estimator. Extensive experiments, including background denoising tasks, demonstrate that the proposed approach reliably recovers principal directions and significantly outperforms classical PCA in the presence of heavy-tailed and impulsive noise, while remaining competitive under Gaussian noise.
Paper Structure (14 sections, 3 theorems, 29 equations, 8 figures)

This paper contains 14 sections, 3 theorems, 29 equations, 8 figures.

Key Result

Theorem 1

Let $\mathbf{X} \in \mathbb{R}^d$ be a random vector and let where $\mathsf{W} \in\mathbb{R}^{d\times m}$ satisfies $\mathsf{W}^\top \mathsf{W}=\mathsf{I}_m$. Then

Figures (8)

  • Figure 1: Estimating $\rho$ in \ref{['eq:estrho']} using first and second methods.
  • Figure 2: Comparing first method equation \ref{['eq:method1a']}, Tyler and the PCA's covariance estimator under heavy-tailed and Gaussian data.
  • Figure 3: Visualization of Principal Component 1 (PC1) using equation \ref{['eq:method1c']} and that given by the standard PCA in comparison with the true PC1 of the sub-Gaussian data.
  • Figure 4:
  • Figure 5:
  • ...and 3 more figures

Theorems & Definitions (7)

  • Definition 4.1: Column space
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof