Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA
Yuchen Zhou, Yuxin Chen
TL;DR
This work addresses robust subspace estimation for a low-rank matrix under heteroskedastic noise and extreme dimensionality imbalance, where traditional SVD-based methods fail as conditioning worsens. It introduces Deflated-HeteroPCA, which partitions the spectrum into well-conditioned, well-separated blocks and applies HeteroPCA blockwise with deflation to control bias, achieving near-optimal, condition-number-free guarantees in both spectral and entrywise norms. Theoretical results show the method attains minimax-rate-like accuracy without a dependence on the condition number, and practical benefits are demonstrated in factor models and tensor PCA, where initialization followed by HOOI yields improved guarantees. The approach broadens the range of SNRs over which accurate subspace estimation is possible, and numerical experiments corroborate substantial gains over diagonal-deleted PCA, vanilla SVD, and standard HeteroPCA. Overall, Deflated-HeteroPCA provides a principled, scalable solution for ill-conditioned, heteroskedastic settings with meaningful implications for high-dimensional statistics and tensor analysis.
Abstract
This paper is concerned with estimating the column subspace of a low-rank matrix $\boldsymbol{X}^\star \in \mathbb{R}^{n_1\times n_2}$ from contaminated data. How to obtain optimal statistical accuracy while accommodating the widest range of signal-to-noise ratios (SNRs) becomes particularly challenging in the presence of heteroskedastic noise and unbalanced dimensionality (i.e., $n_2\gg n_1$). While the state-of-the-art algorithm $\textsf{HeteroPCA}$ emerges as a powerful solution for solving this problem, it suffers from "the curse of ill-conditioning," namely, its performance degrades as the condition number of $\boldsymbol{X}^\star$ grows. In order to overcome this critical issue without compromising the range of allowable SNRs, we propose a novel algorithm, called $\textsf{Deflated-HeteroPCA}$, that achieves near-optimal and condition-number-free theoretical guarantees in terms of both $\ell_2$ and $\ell_{2,\infty}$ statistical accuracy. The proposed algorithm divides the spectrum of $\boldsymbol{X}^\star$ into well-conditioned and mutually well-separated subblocks, and applies $\textsf{HeteroPCA}$ to conquer each subblock successively. Further, an application of our algorithm and theory to two canonical examples -- the factor model and tensor PCA -- leads to remarkable improvement for each application.
