Table of Contents
Fetching ...

Dimension estimation in PCA model using high-dimensional data augmentation

Una Radojicic, Joni Virta

TL;DR

The paper tackles latent-dimension estimation in PCA under high-dimensional data by first analyzing predictor augmentation and showing that the original approach can be inconsistent when both data and augmentation dimensions grow with the sample size. It then proposes a high-dimensional predictor augmentation (HDPA) that debiases spike eigenvalues from the original data, adjusts eigenvector-norm information, and identifies the latent dimension by a jump in a carefully constructed criterion, proving consistency under mild conditions. The authors provide theoretical results on the limits of augmented-eigenstructure in high dimensions and illustrate that, unlike the original method, HDPA remains reliable across a broad range of $\gamma_p$ and $\gamma_r$ regimes. Simulations demonstrate substantial improvements over competing methods, including robustness to non-Gaussian data, and practical guidance for noise-variance estimation and augmentation tuning.

Abstract

We propose a modified, high-dimensional version of a recent dimension estimation procedure that determines the dimension via the introduction of augmented noise variables into the data. Our asymptotic results show that the proposal is consistent in wide high-dimensional scenarios, and further shed light on why the original method breaks down when the dimension of either the data or the augmentation becomes too large. Simulations are used to demonstrate the superiority of the proposal to competitors both under and outside of the theoretical model.

Dimension estimation in PCA model using high-dimensional data augmentation

TL;DR

The paper tackles latent-dimension estimation in PCA under high-dimensional data by first analyzing predictor augmentation and showing that the original approach can be inconsistent when both data and augmentation dimensions grow with the sample size. It then proposes a high-dimensional predictor augmentation (HDPA) that debiases spike eigenvalues from the original data, adjusts eigenvector-norm information, and identifies the latent dimension by a jump in a carefully constructed criterion, proving consistency under mild conditions. The authors provide theoretical results on the limits of augmented-eigenstructure in high dimensions and illustrate that, unlike the original method, HDPA remains reliable across a broad range of and regimes. Simulations demonstrate substantial improvements over competing methods, including robustness to non-Gaussian data, and practical guidance for noise-variance estimation and augmentation tuning.

Abstract

We propose a modified, high-dimensional version of a recent dimension estimation procedure that determines the dimension via the introduction of augmented noise variables into the data. Our asymptotic results show that the proposal is consistent in wide high-dimensional scenarios, and further shed light on why the original method breaks down when the dimension of either the data or the augmentation becomes too large. Simulations are used to demonstrate the superiority of the proposal to competitors both under and outside of the theoretical model.

Paper Structure

This paper contains 8 sections, 6 theorems, 28 equations, 3 figures.

Key Result

Theorem 1

Fix a constant $K \in \mathbb{N}$ such that $K > d$. Under Assumption assu:main_assumption, we have, as $n \rightarrow \infty$,

Figures (3)

  • Figure 1: Graphical illustration of the results of Theorem \ref{['thm:LuoLi_inconsistency']}.
  • Figure 2: Average proportion of the wrong estimates in $1000$ replicates across $20$ different settings for Models 1 and 2.
  • Figure 3: The average augmentation curves $\phi_n$ over $500$ replicates. The solid lines indicate the limiting values.

Theorems & Definitions (14)

  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Theorem 4
  • Corollary 1
  • Remark 3
  • Lemma 1
  • proof : Proof of Theorem 1
  • ...and 4 more