Table of Contents
Fetching ...

On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis

Fangshuo Liao, Junhyung Lyle Kim, Cruz Barnum, Anastasios Kyrillidis

TL;DR

The paper analyzes how numerical errors from inexact PCA propagate through Hotelling's deflation when estimating the top K eigenvectors of a symmetric covariance matrix. It provides two main results: (i) a sub-routine agnostic bound showing exponential growth of deflation errors with K, governed by eigengaps, and (ii) a tighter bound when the PCA subroutine uses power iteration, leveraging directional error information to reduce the growth to a more favorable exponential form. Theoretical tools include Weyl's inequality and the Davis–Kahan sinTheta theorem, along with a recursive analysis of deflation matrices, and Neumann expansions to connect perturbations across steps. The results yield explicit iteration-count requirements for the power iteration to achieve a target accuracy and are complemented by empirical illustrations on MNIST-based spectral clustering. Overall, the work clarifies the practical limits of multi-component PCA via deflation under finite-precision computation and guides how to allocate computational resources for reliable eigenvector recovery.

Abstract

Principal Component Analysis (PCA) aims to find subspaces spanned by the so-called principal components that best represent the variance in the dataset. The deflation method is a popular meta-algorithm that sequentially finds individual principal components, starting from the most important ones and working towards the less important ones. However, as deflation proceeds, numerical errors from the imprecise estimation of principal components propagate due to its sequential nature. This paper mathematically characterizes the error propagation of the inexact Hotelling's deflation method. We consider two scenarios: $i)$ when the sub-routine for finding the leading eigenvector is abstract and can represent various algorithms; and $ii)$ when power iteration is used as the sub-routine. In the latter case, the additional directional information from power iteration allows us to obtain a tighter error bound than the sub-routine agnostic case. For both scenarios, we explicitly characterize how the errors progress and affect subsequent principal component estimations.

On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis

TL;DR

The paper analyzes how numerical errors from inexact PCA propagate through Hotelling's deflation when estimating the top K eigenvectors of a symmetric covariance matrix. It provides two main results: (i) a sub-routine agnostic bound showing exponential growth of deflation errors with K, governed by eigengaps, and (ii) a tighter bound when the PCA subroutine uses power iteration, leveraging directional error information to reduce the growth to a more favorable exponential form. Theoretical tools include Weyl's inequality and the Davis–Kahan sinTheta theorem, along with a recursive analysis of deflation matrices, and Neumann expansions to connect perturbations across steps. The results yield explicit iteration-count requirements for the power iteration to achieve a target accuracy and are complemented by empirical illustrations on MNIST-based spectral clustering. Overall, the work clarifies the practical limits of multi-component PCA via deflation under finite-precision computation and guides how to allocate computational resources for reliable eigenvector recovery.

Abstract

Principal Component Analysis (PCA) aims to find subspaces spanned by the so-called principal components that best represent the variance in the dataset. The deflation method is a popular meta-algorithm that sequentially finds individual principal components, starting from the most important ones and working towards the less important ones. However, as deflation proceeds, numerical errors from the imprecise estimation of principal components propagate due to its sequential nature. This paper mathematically characterizes the error propagation of the inexact Hotelling's deflation method. We consider two scenarios: when the sub-routine for finding the leading eigenvector is abstract and can represent various algorithms; and when power iteration is used as the sub-routine. In the latter case, the additional directional information from power iteration allows us to obtain a tighter error bound than the sub-routine agnostic case. For both scenarios, we explicitly characterize how the errors progress and affect subsequent principal component estimations.
Paper Structure (22 sections, 24 theorems, 182 equations, 4 figures, 1 algorithm)

This paper contains 22 sections, 24 theorems, 182 equations, 4 figures, 1 algorithm.

Key Result

Lemma 2.2

Let ${\mathbf{M}},{\mathbf{M}}^*\in\mathbb{R}^{d\times d}$ be real symmetric matrices. Let $\sigma_j,\sigma_j^*$ be the $j$-th eigenvalue of ${\mathbf{M}}$ and ${\mathbf{M}}^*$, respectively. Then:

Figures (4)

  • Figure 1: Spectral clustering of MNIST dataset using inexact deflation method in Algorithm \ref{['alg:main-alg']}. As the number of power iteration steps increases ($x$-axis), clustering performance, measured by the mutual information metric, also improves. A similar pattern is observed for recovering different numbers of eigenvectors.
  • Figure 2: Dynamics of $\left\|\left(\bm{\Sigma}_k - \bm{\Sigma}_k^*\right){\mathbf{u}}_j^*\right\|_2$ with respect to the change of $k$. Each ${\mathbf{u}}_j^*$ is represented by a different color, with light color for small $j$ and dark color for large $j$. Experiments done for $\bm{\Sigma}\in\mathbb{R}^{100\times 100},\lambda_k^* = \frac{1}{k},\left\{{\mathbf{u}}_k^*\right\}_{k=1}^d$ being randomly generated orthogonal basis, and $t = 200$. The orthogonal basis $\left\{{\mathbf{u}}_k^*\right\}_{k=1}^d$ is generated by randomly sampling a matrix with I.I.D. Gaussian entries, and computing its left singular vectors.
  • Figure 3: The comparison among the dynamics of ${\mathbf{u}}_k^\top{\mathbf{u}}_j^*$ and ${\mathbf{v}}_k^\top{\mathbf{u}}_j^*$ with respect to the change of $k$ for $j\in\{25, 50, 75, 100\}$. Experiments are performed for $\bm{\Sigma}\in\mathbb{R}^{100\times 100},\lambda_k^* = \frac{1}{k},\left\{{\mathbf{u}}_k^*\right\}_{k=1}^d$ being randomly generated orthogonal basis, with $t = 200$. The results show that both $\left|{\mathbf{u}}_k^\top{\mathbf{u}}_j^*\right|$ and $\left|{\mathbf{v}}_k^\top{\mathbf{u}}_j^*\right|$ are small only when $k$ is near $j$. The orthogonal basis $\left\{{\mathbf{u}}_k^*\right\}_{k=1}^d$ is generated by randomly sampling a matrix with I.I.D. Gaussian entries, and computing its left singular vectors.
  • Figure 4: The comparison of between the dynamics of $\left|{\mathbf{u}}_k^\top\left(\bm{\Sigma}_k - \bm{\Sigma}_k^*\right){\mathbf{u}}_k^*\right|$ and $\left\|\left(\bm{\Sigma}_k - \bm{\Sigma}_k^*\right){\mathbf{u}}_j^*\right\|_2$ with respect to the change of $k$ for $j\in\{25, 50, 75, 100\}$. Experiments are performed for $\bm{\Sigma}\in\mathbb{R}^{100\times 100},\lambda_k^* = \frac{1}{k},\left\{{\mathbf{u}}_k^*\right\}_{k=1}^d$ as randomly generated orthogonal basis, with $t = 200$. The results show that $\left\|\left(\bm{\Sigma}_k - \bm{\Sigma}_k^*\right){\mathbf{u}}_j^*\right\|_2$ is a good approximation of $\left|{\mathbf{u}}_k^\top\left(\bm{\Sigma}_k - \bm{\Sigma}_k^*\right){\mathbf{u}}_k^*\right|$ when the latter is large. The orthogonal basis $\left\{{\mathbf{u}}_k^*\right\}_{k=1}^d$ is generated by randomly sampling a matrix with I.I.D. Gaussian entries, and computing its left singular vectors.

Theorems & Definitions (31)

  • Lemma 2.2: Weyl's Inequality weyl1912asymptotische
  • Lemma 2.3: $\sin\Theta$ Theorem davis1970rotation
  • Theorem 3.1
  • Corollary 3.2
  • Corollary 3.3
  • Lemma 3.4
  • Lemma 3.5
  • Theorem 4.1
  • Corollary 4.2
  • Lemma 4.3
  • ...and 21 more