TransPCA for Large-dimensional Factor Analysis with Weak Factors: Power Enhancement via Knowledge Transfer

Yong He; Dong Liu; Yunjing Sun; Yalin Wang

TransPCA for Large-dimensional Factor Analysis with Weak Factors: Power Enhancement via Knowledge Transfer

Yong He, Dong Liu, Yunjing Sun, Yalin Wang

TL;DR

The paper tackles slow convergence of principal component estimators in large-dimensional factor models when weak factors are present. It introduces TransPCA, a transfer-learning PCA that aggregates information across many informative auxiliary panels via a weighted average of loading-space projections, boosting estimation accuracy for weak factors. The authors establish convergence rates for weak/strong loadings and factor scores, propose TransED to determine the number of weak factors, and provide a practical dataset-selection criterion to avoid negative transfer. Empirical studies on macroeconomic/finance data demonstrate substantial gains over target-alone PCA, and the framework offers a scalable, flexible approach for leveraging abundant auxiliary data in high-dimensional factor analysis.

Abstract

Early work established convergence of the principal component estimators of the factors and loadings up to a rotation for large dimensional approximate factor models with weak factors in that the factor loading $Λ^{(0)}$ scales sublinearly in the number $N$ of cross-section units, i.e., $Λ^{(0)\top}Λ^{(0)}/N^α$ is positive definite in the limit for some $α\in (0,1)$. However, the established convergence rates for weak factors can be much slower especially for small $α$. This article proposes a Transfer Principal Component Analysis (TransPCA) method for enhancing the convergence rates for weak factors by transferring knowledge from large number of available informative panel datasets, which should not be turned a blind eye on in this big data era. We aggregate useful information by analyzing a weighted average projection matrix of the estimated loading spaces from all informative datasets which is highly flexible and computationally efficient. Theoretically, we derive the convergence rates of the estimators of weak/strong loading spaces and factor scores. The results indicate that as long as the auxiliary datasets are similar enough to the target dataset and the auxiliary sample size is sufficiently large, TransPCA estimators can achieve faster convergence rates in contrast to performing PCA solely on the target dataset. To avoid negative transfer, we also investigate the case that the informative datasets are unknown and provide a criterion for selecting useful datasets. Thorough simulation studies and {empirical analysis on real datasets in areas of macroeconomic and finance} are conducted to illustrate the usefulness of our proposed methods where large number of source panel datasets are naturally available.

TransPCA for Large-dimensional Factor Analysis with Weak Factors: Power Enhancement via Knowledge Transfer

TL;DR

Abstract

scales sublinearly in the number

of cross-section units, i.e.,

is positive definite in the limit for some

. However, the established convergence rates for weak factors can be much slower especially for small

. This article proposes a Transfer Principal Component Analysis (TransPCA) method for enhancing the convergence rates for weak factors by transferring knowledge from large number of available informative panel datasets, which should not be turned a blind eye on in this big data era. We aggregate useful information by analyzing a weighted average projection matrix of the estimated loading spaces from all informative datasets which is highly flexible and computationally efficient. Theoretically, we derive the convergence rates of the estimators of weak/strong loading spaces and factor scores. The results indicate that as long as the auxiliary datasets are similar enough to the target dataset and the auxiliary sample size is sufficiently large, TransPCA estimators can achieve faster convergence rates in contrast to performing PCA solely on the target dataset. To avoid negative transfer, we also investigate the case that the informative datasets are unknown and provide a criterion for selecting useful datasets. Thorough simulation studies and {empirical analysis on real datasets in areas of macroeconomic and finance} are conducted to illustrate the usefulness of our proposed methods where large number of source panel datasets are naturally available.

TransPCA for Large-dimensional Factor Analysis with Weak Factors: Power Enhancement via Knowledge Transfer

TL;DR

Abstract

TransPCA for Large-dimensional Factor Analysis with Weak Factors: Power Enhancement via Knowledge Transfer

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (13)