Table of Contents
Fetching ...

TransPCA for Large-dimensional Factor Analysis with Weak Factors: Power Enhancement via Knowledge Transfer

Yong He, Dong Liu, Yunjing Sun, Yalin Wang

TL;DR

The paper tackles slow convergence of principal component estimators in large-dimensional factor models when weak factors are present. It introduces TransPCA, a transfer-learning PCA that aggregates information across many informative auxiliary panels via a weighted average of loading-space projections, boosting estimation accuracy for weak factors. The authors establish convergence rates for weak/strong loadings and factor scores, propose TransED to determine the number of weak factors, and provide a practical dataset-selection criterion to avoid negative transfer. Empirical studies on macroeconomic/finance data demonstrate substantial gains over target-alone PCA, and the framework offers a scalable, flexible approach for leveraging abundant auxiliary data in high-dimensional factor analysis.

Abstract

Early work established convergence of the principal component estimators of the factors and loadings up to a rotation for large dimensional approximate factor models with weak factors in that the factor loading $Λ^{(0)}$ scales sublinearly in the number $N$ of cross-section units, i.e., $Λ^{(0)\top}Λ^{(0)}/N^α$ is positive definite in the limit for some $α\in (0,1)$. However, the established convergence rates for weak factors can be much slower especially for small $α$. This article proposes a Transfer Principal Component Analysis (TransPCA) method for enhancing the convergence rates for weak factors by transferring knowledge from large number of available informative panel datasets, which should not be turned a blind eye on in this big data era. We aggregate useful information by analyzing a weighted average projection matrix of the estimated loading spaces from all informative datasets which is highly flexible and computationally efficient. Theoretically, we derive the convergence rates of the estimators of weak/strong loading spaces and factor scores. The results indicate that as long as the auxiliary datasets are similar enough to the target dataset and the auxiliary sample size is sufficiently large, TransPCA estimators can achieve faster convergence rates in contrast to performing PCA solely on the target dataset. To avoid negative transfer, we also investigate the case that the informative datasets are unknown and provide a criterion for selecting useful datasets. Thorough simulation studies and {empirical analysis on real datasets in areas of macroeconomic and finance} are conducted to illustrate the usefulness of our proposed methods where large number of source panel datasets are naturally available.

TransPCA for Large-dimensional Factor Analysis with Weak Factors: Power Enhancement via Knowledge Transfer

TL;DR

The paper tackles slow convergence of principal component estimators in large-dimensional factor models when weak factors are present. It introduces TransPCA, a transfer-learning PCA that aggregates information across many informative auxiliary panels via a weighted average of loading-space projections, boosting estimation accuracy for weak factors. The authors establish convergence rates for weak/strong loadings and factor scores, propose TransED to determine the number of weak factors, and provide a practical dataset-selection criterion to avoid negative transfer. Empirical studies on macroeconomic/finance data demonstrate substantial gains over target-alone PCA, and the framework offers a scalable, flexible approach for leveraging abundant auxiliary data in high-dimensional factor analysis.

Abstract

Early work established convergence of the principal component estimators of the factors and loadings up to a rotation for large dimensional approximate factor models with weak factors in that the factor loading scales sublinearly in the number of cross-section units, i.e., is positive definite in the limit for some . However, the established convergence rates for weak factors can be much slower especially for small . This article proposes a Transfer Principal Component Analysis (TransPCA) method for enhancing the convergence rates for weak factors by transferring knowledge from large number of available informative panel datasets, which should not be turned a blind eye on in this big data era. We aggregate useful information by analyzing a weighted average projection matrix of the estimated loading spaces from all informative datasets which is highly flexible and computationally efficient. Theoretically, we derive the convergence rates of the estimators of weak/strong loading spaces and factor scores. The results indicate that as long as the auxiliary datasets are similar enough to the target dataset and the auxiliary sample size is sufficiently large, TransPCA estimators can achieve faster convergence rates in contrast to performing PCA solely on the target dataset. To avoid negative transfer, we also investigate the case that the informative datasets are unknown and provide a criterion for selecting useful datasets. Thorough simulation studies and {empirical analysis on real datasets in areas of macroeconomic and finance} are conducted to illustrate the usefulness of our proposed methods where large number of source panel datasets are naturally available.

Paper Structure

This paper contains 22 sections, 10 theorems, 47 equations, 3 figures, 7 tables, 2 algorithms.

Key Result

Theorem 3.1

(General stochastic order for estimation errors) Under Assumptions factors to Identifiability and NT-relation.(1) with the number of factors $s$, $r_k, k\in \{0\}\cup \mathcal{A }$ fixed and given, as $N$, $\underset{k\in\{0\}\cup \mathcal{A}}{\min}T_k \rightarrow \infty$, we have

Figures (3)

  • Figure 1: Averaged estimation errors of $\mathcal{D}\left(\mathcal{M}\left(\widehat{\bm{\Lambda}}^{(0)}\right),\mathcal{M}\left(\bm{\Lambda}^{(0)}\right)\right)$ (left figure) and MSE (right figure) for Scenario ($\bm{b}$) using different methods with $T_k=(200, 300, 400)$ when $K=4$, $N=50$, $T_0=50$.
  • Figure 2: Time series plots for Housing Starts (South) and Avg Weekly Hours : Goods-Producing. The gray dashed vertical lines are the change point locations.
  • Figure 3: Portfolio value curves using TransPCA and target-only methods with $r_0=2$ (left) and $r_0=3$ (right).

Theorems & Definitions (13)

  • Remark 1
  • Theorem 3.1
  • Corollary 3.1
  • Remark 2
  • Proposition 1
  • Remark 3
  • Theorem 3.2
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • ...and 3 more