Table of Contents
Fetching ...

Optimal vintage factor analysis with deflation varimax

Xin Bing, Xin He, Dian Jin, Yuqian Zhang

TL;DR

This work advances vintage factor analysis by introducing a deflation varimax procedure that computes the rotation row-by-row via projected gradient descent, enabling provable guarantees for recovering the loading matrix Λ from X = ΛZ + E. By establishing a decomposition of the principal components and analyzing the nonconvex rotation landscape, the authors show minimax-optimal rates for Λ estimation and provide initialization schemes (random and method-of-moments) with finite-sample guarantees. They further propose structured-noise improvements, including eigenvalue-based bias reduction and gradient corrections, which yield faster convergence and optimal rates across SNR regimes. The methods are validated on real data (MNIST digits and image recovery) and supported by extensive simulations, demonstrating practical benefits for learning meaningful, rotated bases and sparse representations.

Abstract

Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. The most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation. Despite its popularity, little theoretical guarantee can be provided to date mainly because varimax rotation requires to solve a non-convex optimization over the set of orthogonal matrices. In this paper, we propose a deflation varimax procedure that solves each row of an orthogonal matrix sequentially. In addition to its net computational gain and flexibility, we are able to fully establish theoretical guarantees for the proposed procedure in a broader context. Adopting this new deflation varimax as the second step after PCA, we further analyze this two step procedure under a general class of factor models. Our results show that it estimates the factor loading matrix in the minimax optimal rate when the signal-to-noise-ratio (SNR) is moderate or large. In the low SNR regime, we offer possible improvement over using PCA and the deflation varimax when the additive noise under the factor model is structured. The modified procedure is shown to be minimax optimal in all SNR regimes. Our theory is valid for finite sample and allows the number of the latent factors to grow with the sample size as well as the ambient dimension to grow with, or even exceed, the sample size. Extensive simulation and real data analysis further corroborate our theoretical findings.

Optimal vintage factor analysis with deflation varimax

TL;DR

This work advances vintage factor analysis by introducing a deflation varimax procedure that computes the rotation row-by-row via projected gradient descent, enabling provable guarantees for recovering the loading matrix Λ from X = ΛZ + E. By establishing a decomposition of the principal components and analyzing the nonconvex rotation landscape, the authors show minimax-optimal rates for Λ estimation and provide initialization schemes (random and method-of-moments) with finite-sample guarantees. They further propose structured-noise improvements, including eigenvalue-based bias reduction and gradient corrections, which yield faster convergence and optimal rates across SNR regimes. The methods are validated on real data (MNIST digits and image recovery) and supported by extensive simulations, demonstrating practical benefits for learning meaningful, rotated bases and sparse representations.

Abstract

Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. The most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation. Despite its popularity, little theoretical guarantee can be provided to date mainly because varimax rotation requires to solve a non-convex optimization over the set of orthogonal matrices. In this paper, we propose a deflation varimax procedure that solves each row of an orthogonal matrix sequentially. In addition to its net computational gain and flexibility, we are able to fully establish theoretical guarantees for the proposed procedure in a broader context. Adopting this new deflation varimax as the second step after PCA, we further analyze this two step procedure under a general class of factor models. Our results show that it estimates the factor loading matrix in the minimax optimal rate when the signal-to-noise-ratio (SNR) is moderate or large. In the low SNR regime, we offer possible improvement over using PCA and the deflation varimax when the additive noise under the factor model is structured. The modified procedure is shown to be minimax optimal in all SNR regimes. Our theory is valid for finite sample and allows the number of the latent factors to grow with the sample size as well as the ambient dimension to grow with, or even exceed, the sample size. Extensive simulation and real data analysis further corroborate our theoretical findings.
Paper Structure (62 sections, 50 theorems, 442 equations, 7 figures, 2 algorithms)

This paper contains 62 sections, 50 theorems, 442 equations, 7 figures, 2 algorithms.

Key Result

Theorem 1

Under Assumptions ass_Z, ass_A_general and ass_E_general, assume there exists some sufficiently small constant $c>0$ such that $\epsilon^2 \le c$, $r\log (n) \le c n$ and $\epsilon^2 p \log(n) \le c n$. Then there exists some $\mathbf R\in\mathbb{O}_{r\times r}$ such that, with probability at least where $\mathbf N = \mathbf S^{-1}\mathbf L^\top \mathbf E / \sigma$ and $\mathbf \Omega = \mathb

Figures (7)

  • Figure 1: Learned 49 basis by PCA (left) and PCA-dVarimax (right).
  • Figure 2: Ten selected pairs of unrotated PCs (left) and rotated PCs (right).
  • Figure 3: The averaged estimation errors of PCA-dVarimax coupled with different initialization schemes.
  • Figure 4: The averaged estimation errors of PCA-dVarimax with different initialization schemes at various numbers of iterations
  • Figure 5: The averaged estimation errors of each procedure.
  • ...and 2 more figures

Theorems & Definitions (111)

  • Remark 1: Comparison with the classical deflation procedure
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • Remark 2: Choice of the step size
  • Remark 3: Effect of the initialization
  • ...and 101 more