On Minimum Trace Factor Analysis -- An Old Song Sung to a New Tune
C. Li, A. Shkolnik
TL;DR
The paper addresses robust low-rank covariance estimation under heteroskedastic noise and missing data by introducing a relaxed minimum trace factor analysis (MTFA). It posits a convex program $F(L,D)=\tau\|L\|_*+\tfrac12\|\Sigma-(L+D)\|_F^2$ with $L\succeq0$ and diagonal $D$, solved via a fixed-point alternating scheme that uses eigenvalue soft-thresholding to yield a unique solution. The authors establish deterministic sin$\Theta$ subspace recovery bounds, attain minimax-rate guarantees under factor-models, and demonstrate that the method remains robust to ill-conditioning while avoiding Heywood cases through PSD constraints; they also connect the framework to LASSO, HeteroPCA, and Soft-Impute. Empirical results show that the relaxed MTFA often outperforms PCA-based approaches and existing heteroskedastic-noise methods across varying noise levels, missing data, and conditioning, underscoring its practical value for factor analysis and matrix estimation under heteroskedastic perturbations.
Abstract
Dimensionality reduction methods, such as principal component analysis (PCA) and factor analysis, are central to many problems in data science. There are, however, serious and well-understood challenges to finding robust low dimensional approximations for data with significant heteroskedastic noise. This paper introduces a relaxed version of Minimum Trace Factor Analysis (MTFA), a convex optimization method with roots dating back to the work of Ledermann in 1940. This relaxation is particularly effective at not overfitting to heteroskedastic perturbations and addresses the commonly cited Heywood cases in factor analysis and the recently identified "curse of ill-conditioning" for existing spectral methods. We provide theoretical guarantees on the accuracy of the resulting low rank subspace and the convergence rate of the proposed algorithm to compute that matrix. We develop a number of interesting connections to existing methods, including HeteroPCA, Lasso, and Soft-Impute, to fill an important gap in the already large literature on low rank matrix estimation. Numerical experiments benchmark our results against several recent proposals for dealing with heteroskedastic noise.
