Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

Yuchen Zhou; Yuxin Chen

Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

Yuchen Zhou, Yuxin Chen

TL;DR

This work addresses robust subspace estimation for a low-rank matrix under heteroskedastic noise and extreme dimensionality imbalance, where traditional SVD-based methods fail as conditioning worsens. It introduces Deflated-HeteroPCA, which partitions the spectrum into well-conditioned, well-separated blocks and applies HeteroPCA blockwise with deflation to control bias, achieving near-optimal, condition-number-free guarantees in both spectral and entrywise norms. Theoretical results show the method attains minimax-rate-like accuracy without a dependence on the condition number, and practical benefits are demonstrated in factor models and tensor PCA, where initialization followed by HOOI yields improved guarantees. The approach broadens the range of SNRs over which accurate subspace estimation is possible, and numerical experiments corroborate substantial gains over diagonal-deleted PCA, vanilla SVD, and standard HeteroPCA. Overall, Deflated-HeteroPCA provides a principled, scalable solution for ill-conditioned, heteroskedastic settings with meaningful implications for high-dimensional statistics and tensor analysis.

Abstract

This paper is concerned with estimating the column subspace of a low-rank matrix $\boldsymbol{X}^\star \in \mathbb{R}^{n_1\times n_2}$ from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e., $n_2\gg n_1$). While the state-of-the-art algorithm $\textsf{HeteroPCA}$ emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of $\boldsymbol{X}^\star$ grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called $\textsf{Deflated-HeteroPCA}$, that achieves near-optimal and condition-number-free theoretical guarantees in terms of both $\ell_2$ and $\ell_{2,\infty}$ statistical accuracy. The proposed algorithm divides the spectrum of $\boldsymbol{X}^\star$ into well-conditioned and mutually well-separated subblocks, and applies $\textsf{HeteroPCA}$ to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples -- the factor model and tensor PCA -- leads to remarkable improvement for each application.

Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

TL;DR

Abstract

This paper is concerned with estimating the column subspace of a low-rank matrix

from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e.,

). While the state-of-the-art algorithm

emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of

grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called

, that achieves near-optimal and condition-number-free theoretical guarantees in terms of both

and

statistical accuracy. The proposed algorithm divides the spectrum of

into well-conditioned and mutually well-separated subblocks, and applies

to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples -- the factor model and tensor PCA -- leads to remarkable improvement for each application.

Paper Structure (74 sections, 15 theorems, 277 equations, 7 figures, 3 algorithms)

This paper contains 74 sections, 15 theorems, 277 equations, 7 figures, 3 algorithms.

Introduction
Challenges: unbalanced dimensionality and heteroskedasticity
The curse of ill-conditioning
This paper
Paper organization.
Notation
Problem formulation
Models and assumptions.
Goal.
Algorithms
Review: SVD, diagonal-deleted PCA and HeteroPCA.
The proposed algorithm: Deflated-HeteroPCA.
Main theory
Spectral-norm-based statistical guarantees
Fine-grained $\ell_{2,\infty}$-norm-based statistical guarantees
...and 59 more sections

Key Result

Theorem 1

Suppose that Assumption assump:hetero holds. Assume that for some sufficiently large (resp. small) constant $C_0 > 0$ (resp. $c_0 > 0$). If the numbers of iterations obey for some large enough constant $C>0$, then with probability exceeding $1 - O(n^{-10})$, the output returned by Algorithm algorithm:sequential_heteroPCA satisfies Here, $r_0 = 0$, $r_1, \dots, r_{k_{\sf max}}$ are the ranks sel

Figures (7)

Figure 1: Subspace estimation error vs. condition number $\kappa$ of $\bm{\Sigma}^{\star}$. Here, we set $r = 2, n_1 = 200$ and $n_2 = 40,000$. The truth $\bm{X}^\star = \bm{U}^\star\bm{\Sigma}^\star\bm{V}^{\star\top}$ has rank 2 with $\bm{U}^\star \in \mathcal{R}^{n_1\times 2}$ and $\bm{V}^\star \in \mathcal{R}^{n_2\times 2}$ generated randomly. Plot (a) represents the noiseless case ($\bm{E}=\bm{0}$). In Plot (b), we choose the two singular values of $\bm{X}^{\star}$ as $\sigma_1^\star = \kappa\sigma_2^\star$ and $\sigma_2^\star = 200$, generate $\{\omega_i\}_{1\leq i\leq n_1}$ independently from $\mathsf{Unif}([0, 2])$, and draw the entries of $\bm{E}=[E_{i,j}]_{1\leq i\leq n_1, 1\leq j \leq n_2}$ independently such that $E_{i,j}\sim \mathcal{N}(0, \omega_i^2)$. We compare multiple subspace estimators here, where $\mathsf{HeteroPCA}$ is run with 100 iterations. For each estimator $\widehat{\bm{U}}$, we compute the spectral-norm-based error $\|\widehat{\bm{U}}\bm{R}_{\widehat{\bm{U}}} - \bm{U}^{\star}\|$ as $\kappa$ varies, where $\bm{R}_{\widehat{\bm{U}}} = \mathop{\rm arg\min}_{\bm{R} \in \mathcal{O}^{r, r}}\|\widehat{\bm{U}}\bm{R} - \bm{U}^\star\|_{{\mathrm{F}}}$; the results are averaged over 50 independent runs.
Figure 2: Estimation errors of $\bm{U}$ for Deflated-HeteroPCA, Diagonal-deleted PCA, HeteroPCA and Vanilla SVD for $r = 3$. Plot (a) (resp. (b)) reports the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the noise level $\omega$ (where $n_1 = 100, n_2 = 1,000, \kappa = 5$). Plot (c) (resp. (d)) shows the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the column dimension $\kappa$ (where $n_1 = 100, n_2 = 1,000, \omega = 1$). Plot (e) (resp. (f)) displays the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the condition number $n_2$ (where $n_1 = 100, \kappa = 5, \omega = 1$).
Figure 3: Estimation errors of $\bm{U}$ for Deflated-HeteroPCA, Diagonal-deleted PCA, HeteroPCA and Vanilla SVD when $r = 5$. Plot (a) (resp. (b)) displays the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the noise level $\omega$ (where $n_1 = 100, n_2 = 1,000, \kappa = 5$). Plot (c) (resp. (d)) shows the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the condition number $\kappa$ (where $n_1 = 100, n_2 = 1,000, \omega = 1$). Plot (e) (resp. (f)) diaplsys the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the column dimension $n_2$ (where $n_1 = 100, \kappa = 5, \omega = 1$).
Figure 4: Estimation errors of $\bm{U}$ for Deflated-HeteroPCA, Diagonal-deleted PCA, HeteroPCA and Vanilla SVD under the factor model \ref{['model:PCA']} when $r = 3$. Plot (a) (resp. (b)) displays the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the noise level $\omega$ (where $d = 100, n = 1,000, \kappa_{\sf pc} = 100$). Plot (c) (resp. (d)) shows the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the condition number $\kappa_{\sf pc}$ (where $d = 100, n = 1,000, \omega = 1$). Plot (e) (resp. (f)) displays the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the sample size $n$ (where $d = 100, \kappa_{\sf pc} = 100, \omega = 1$).
Figure 5: Estimation errors of $\bm{U}$ for Deflated-HeteroPCA, Diagonal-deleted PCA, HeteroPCA and Vanilla SVD under the Poisson PCA model. Plot (a) (resp. (b)) reports the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. $\lambda$ (where $n_1 = 100, n_2 = 1,000, r = 3$).
...and 2 more figures

Theorems & Definitions (19)

Theorem 1
Remark 2
Theorem 2
Corollary 1
Corollary 2
Theorem 3
Theorem 4
Lemma 1: xia2021normal, Theorem 1
Theorem 5
Lemma 2
...and 9 more

Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

TL;DR

Abstract

Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (19)