Table of Contents
Fetching ...

Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

Yuchen Zhou, Yuxin Chen

TL;DR

This work addresses robust subspace estimation for a low-rank matrix under heteroskedastic noise and extreme dimensionality imbalance, where traditional SVD-based methods fail as conditioning worsens. It introduces Deflated-HeteroPCA, which partitions the spectrum into well-conditioned, well-separated blocks and applies HeteroPCA blockwise with deflation to control bias, achieving near-optimal, condition-number-free guarantees in both spectral and entrywise norms. Theoretical results show the method attains minimax-rate-like accuracy without a dependence on the condition number, and practical benefits are demonstrated in factor models and tensor PCA, where initialization followed by HOOI yields improved guarantees. The approach broadens the range of SNRs over which accurate subspace estimation is possible, and numerical experiments corroborate substantial gains over diagonal-deleted PCA, vanilla SVD, and standard HeteroPCA. Overall, Deflated-HeteroPCA provides a principled, scalable solution for ill-conditioned, heteroskedastic settings with meaningful implications for high-dimensional statistics and tensor analysis.

Abstract

This paper is concerned with estimating the column subspace of a low-rank matrix $\boldsymbol{X}^\star \in \mathbb{R}^{n_1\times n_2}$ from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e., $n_2\gg n_1$). While the state-of-the-art algorithm $\textsf{HeteroPCA}$ emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of $\boldsymbol{X}^\star$ grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called $\textsf{Deflated-HeteroPCA}$, that achieves near-optimal and condition-number-free theoretical guarantees in terms of both $\ell_2$ and $\ell_{2,\infty}$ statistical accuracy. The proposed algorithm divides the spectrum of $\boldsymbol{X}^\star$ into well-conditioned and mutually well-separated subblocks, and applies $\textsf{HeteroPCA}$ to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples -- the factor model and tensor PCA -- leads to remarkable improvement for each application.

Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

TL;DR

This work addresses robust subspace estimation for a low-rank matrix under heteroskedastic noise and extreme dimensionality imbalance, where traditional SVD-based methods fail as conditioning worsens. It introduces Deflated-HeteroPCA, which partitions the spectrum into well-conditioned, well-separated blocks and applies HeteroPCA blockwise with deflation to control bias, achieving near-optimal, condition-number-free guarantees in both spectral and entrywise norms. Theoretical results show the method attains minimax-rate-like accuracy without a dependence on the condition number, and practical benefits are demonstrated in factor models and tensor PCA, where initialization followed by HOOI yields improved guarantees. The approach broadens the range of SNRs over which accurate subspace estimation is possible, and numerical experiments corroborate substantial gains over diagonal-deleted PCA, vanilla SVD, and standard HeteroPCA. Overall, Deflated-HeteroPCA provides a principled, scalable solution for ill-conditioned, heteroskedastic settings with meaningful implications for high-dimensional statistics and tensor analysis.

Abstract

This paper is concerned with estimating the column subspace of a low-rank matrix from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e., ). While the state-of-the-art algorithm emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called , that achieves near-optimal and condition-number-free theoretical guarantees in terms of both and statistical accuracy. The proposed algorithm divides the spectrum of into well-conditioned and mutually well-separated subblocks, and applies to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples -- the factor model and tensor PCA -- leads to remarkable improvement for each application.
Paper Structure (74 sections, 15 theorems, 277 equations, 7 figures, 3 algorithms)

This paper contains 74 sections, 15 theorems, 277 equations, 7 figures, 3 algorithms.

Key Result

Theorem 1

Suppose that Assumption assump:hetero holds. Assume that for some sufficiently large (resp. small) constant $C_0 > 0$ (resp. $c_0 > 0$). If the numbers of iterations obey for some large enough constant $C>0$, then with probability exceeding $1 - O(n^{-10})$, the output returned by Algorithm algorithm:sequential_heteroPCA satisfies Here, $r_0 = 0$, $r_1, \dots, r_{k_{\sf max}}$ are the ranks sel

Figures (7)

  • Figure 1: Subspace estimation error vs. condition number $\kappa$ of $\bm{\Sigma}^{\star}$. Here, we set $r = 2, n_1 = 200$ and $n_2 = 40,000$. The truth $\bm{X}^\star = \bm{U}^\star\bm{\Sigma}^\star\bm{V}^{\star\top}$ has rank 2 with $\bm{U}^\star \in \mathcal{R}^{n_1\times 2}$ and $\bm{V}^\star \in \mathcal{R}^{n_2\times 2}$ generated randomly. Plot (a) represents the noiseless case ($\bm{E}=\bm{0}$). In Plot (b), we choose the two singular values of $\bm{X}^{\star}$ as $\sigma_1^\star = \kappa\sigma_2^\star$ and $\sigma_2^\star = 200$, generate $\{\omega_i\}_{1\leq i\leq n_1}$ independently from $\mathsf{Unif}([0, 2])$, and draw the entries of $\bm{E}=[E_{i,j}]_{1\leq i\leq n_1, 1\leq j \leq n_2}$ independently such that $E_{i,j}\sim \mathcal{N}(0, \omega_i^2)$. We compare multiple subspace estimators here, where $\mathsf{HeteroPCA}$ is run with 100 iterations. For each estimator $\widehat{\bm{U}}$, we compute the spectral-norm-based error $\|\widehat{\bm{U}}\bm{R}_{\widehat{\bm{U}}} - \bm{U}^{\star}\|$ as $\kappa$ varies, where $\bm{R}_{\widehat{\bm{U}}} = \mathop{\rm arg\min}_{\bm{R} \in \mathcal{O}^{r, r}}\|\widehat{\bm{U}}\bm{R} - \bm{U}^\star\|_{{\mathrm{F}}}$; the results are averaged over 50 independent runs.
  • Figure 2: Estimation errors of $\bm{U}$ for Deflated-HeteroPCA, Diagonal-deleted PCA, HeteroPCA and Vanilla SVD for $r = 3$. Plot (a) (resp. (b)) reports the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the noise level $\omega$ (where $n_1 = 100, n_2 = 1,000, \kappa = 5$). Plot (c) (resp. (d)) shows the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the column dimension $\kappa$ (where $n_1 = 100, n_2 = 1,000, \omega = 1$). Plot (e) (resp. (f)) displays the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the condition number $n_2$ (where $n_1 = 100, \kappa = 5, \omega = 1$).
  • Figure 3: Estimation errors of $\bm{U}$ for Deflated-HeteroPCA, Diagonal-deleted PCA, HeteroPCA and Vanilla SVD when $r = 5$. Plot (a) (resp. (b)) displays the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the noise level $\omega$ (where $n_1 = 100, n_2 = 1,000, \kappa = 5$). Plot (c) (resp. (d)) shows the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the condition number $\kappa$ (where $n_1 = 100, n_2 = 1,000, \omega = 1$). Plot (e) (resp. (f)) diaplsys the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the column dimension $n_2$ (where $n_1 = 100, \kappa = 5, \omega = 1$).
  • Figure 4: Estimation errors of $\bm{U}$ for Deflated-HeteroPCA, Diagonal-deleted PCA, HeteroPCA and Vanilla SVD under the factor model \ref{['model:PCA']} when $r = 3$. Plot (a) (resp. (b)) displays the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the noise level $\omega$ (where $d = 100, n = 1,000, \kappa_{\sf pc} = 100$). Plot (c) (resp. (d)) shows the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the condition number $\kappa_{\sf pc}$ (where $d = 100, n = 1,000, \omega = 1$). Plot (e) (resp. (f)) displays the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. the sample size $n$ (where $d = 100, \kappa_{\sf pc} = 100, \omega = 1$).
  • Figure 5: Estimation errors of $\bm{U}$ for Deflated-HeteroPCA, Diagonal-deleted PCA, HeteroPCA and Vanilla SVD under the Poisson PCA model. Plot (a) (resp. (b)) reports the $\ell_2$ (resp. $\ell_{2,\infty}$) error vs. $\lambda$ (where $n_1 = 100, n_2 = 1,000, r = 3$).
  • ...and 2 more figures

Theorems & Definitions (19)

  • Theorem 1
  • Remark 2
  • Theorem 2
  • Corollary 1
  • Corollary 2
  • Theorem 3
  • Theorem 4
  • Lemma 1: xia2021normal, Theorem 1
  • Theorem 5
  • Lemma 2
  • ...and 9 more