PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning

Mingqi Wu; Qiang Sun; Yi Yang

PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning

Mingqi Wu, Qiang Sun, Yi Yang

TL;DR

This work studies how to robustly recover a shared low‑dimensional signal from paired high‑dimensional data corrupted by structured background. It first analyzes alignment‑only PCA (PCA+) and then introduces PCA++, a hard uniformity constrained variant that reduces to a generalized eigenproblem and remains stable in high dimensions. The authors provide exact high‑dimensional asymptotics for both fixed aspect‑ratio and growing‑spike regimes, demonstrating that explicit feature dispersion regularizes against background interference. Empirically, PCA++ outperforms standard PCA and PCA+ on simulations, corrupted MNIST, and single‑cell RNA‑seq data, and the theory clarifies uniformity’s role as a robust regularizer in contrastive learning. Overall, the paper links uniformity to practical robustness, with implications for self‑supervised and multiview learning in noisy, high‑dimensional settings.

Abstract

High-dimensional data often contain low-dimensional signals obscured by structured background noise, which limits the effectiveness of standard PCA. Motivated by contrastive learning, we address the problem of recovering shared signal subspaces from positive pairs, paired observations sharing the same signal but differing in background. Our baseline, PCA+, uses alignment-only contrastive learning and succeeds when background variation is mild, but fails under strong noise or high-dimensional regimes. To address this, we introduce PCA++, a hard uniformity-constrained contrastive PCA that enforces identity covariance on projected features. PCA++ has a closed-form solution via a generalized eigenproblem, remains stable in high dimensions, and provably regularizes against background interference. We provide exact high-dimensional asymptotics in both fixed-aspect-ratio and growing-spike regimes, showing uniformity's role in robust signal recovery. Empirically, PCA++ outperforms standard PCA and alignment-only PCA+ on simulations, corrupted-MNIST, and single-cell transcriptomics, reliably recovering condition-invariant structure. More broadly, we clarify uniformity's role in contrastive learning, showing that explicit feature dispersion defends against structured noise and enhances robustness.

PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning

TL;DR

Abstract

PCA++: How Uniformity Induces Robustness to Background Noise in Contrastive Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (16)