Table of Contents
Fetching ...

High-dimensional inference for single-index model with latent factors

Yanmei Shi, Meiling Hao, Yanlin Tang, Heng Lian, Xu Guo

TL;DR

This work tackles high-dimensional regression with latent factors by introducing the Factor Augmented Sparse Single Index Model (FASIM), which captures nonlinear covariate–response relationships while accounting for latent structure. It develops a fast factor-adequacy test (FAST) based on a score-type statistic and a Gaussian multiplier bootstrap that avoids estimating high-dimensional coefficients or precision matrices. When the factor model is deemed adequate, the paper proposes regularized estimation with a subsequent debiased inference procedure to obtain valid coefficient-wise confidence intervals under minimal moment conditions. The approach is shown to be robust to heavy-tailed errors and outliers, with strong finite-sample performance in simulations and a real-data macroeconomic analysis (FRED-MD). Overall, the framework provides scalable, robust tools for inference in high-dimensional factor-augmented settings and offers practical guarantees for both testing and estimation.

Abstract

Models with latent factors recently attract a lot of attention. However, most investigations focus on linear regression models and thus cannot capture nonlinearity. To address this issue, we propose a novel Factor Augmented Single-Index Model. We first address the concern whether it is necessary to consider the augmented part by introducing a score-type test statistic. Compared with previous test statistics, our proposed test statistic does not need to estimate the high-dimensional regression coefficients, nor high-dimensional precision matrix, making it simpler in implementation. We also propose a Gaussian multiplier bootstrap to determine the critical value. The validity of our procedure is theoretically established under suitable conditions. We further investigate the penalized estimation of the regression model. With estimated latent factors, we establish the error bounds of the estimators. Lastly, we introduce debiased estimator and construct confidence interval for individual coefficient based on the asymptotic normality. No moment condition for the error term is imposed for our proposal. Thus our procedures work well when random error follows heavy-tailed distributions or when outliers are present. We demonstrate the finite sample performance of the proposed method through comprehensive numerical studies and its application to an FRED-MD macroeconomics dataset.

High-dimensional inference for single-index model with latent factors

TL;DR

This work tackles high-dimensional regression with latent factors by introducing the Factor Augmented Sparse Single Index Model (FASIM), which captures nonlinear covariate–response relationships while accounting for latent structure. It develops a fast factor-adequacy test (FAST) based on a score-type statistic and a Gaussian multiplier bootstrap that avoids estimating high-dimensional coefficients or precision matrices. When the factor model is deemed adequate, the paper proposes regularized estimation with a subsequent debiased inference procedure to obtain valid coefficient-wise confidence intervals under minimal moment conditions. The approach is shown to be robust to heavy-tailed errors and outliers, with strong finite-sample performance in simulations and a real-data macroeconomic analysis (FRED-MD). Overall, the framework provides scalable, robust tools for inference in high-dimensional factor-augmented settings and offers practical guarantees for both testing and estimation.

Abstract

Models with latent factors recently attract a lot of attention. However, most investigations focus on linear regression models and thus cannot capture nonlinearity. To address this issue, we propose a novel Factor Augmented Single-Index Model. We first address the concern whether it is necessary to consider the augmented part by introducing a score-type test statistic. Compared with previous test statistics, our proposed test statistic does not need to estimate the high-dimensional regression coefficients, nor high-dimensional precision matrix, making it simpler in implementation. We also propose a Gaussian multiplier bootstrap to determine the critical value. The validity of our procedure is theoretically established under suitable conditions. We further investigate the penalized estimation of the regression model. With estimated latent factors, we establish the error bounds of the estimators. Lastly, we introduce debiased estimator and construct confidence interval for individual coefficient based on the asymptotic normality. No moment condition for the error term is imposed for our proposal. Thus our procedures work well when random error follows heavy-tailed distributions or when outliers are present. We demonstrate the finite sample performance of the proposed method through comprehensive numerical studies and its application to an FRED-MD macroeconomics dataset.
Paper Structure (23 sections, 6 theorems, 63 equations, 8 figures, 4 tables)

This paper contains 23 sections, 6 theorems, 63 equations, 8 figures, 4 tables.

Key Result

Proposition 2.1

Assume that $\mathbb{E}\left(\boldsymbol{v} \mid \boldsymbol{v}^{\top}\boldsymbol{\eta}\right)$ is a linear function of $\boldsymbol{v}^{\top}\boldsymbol{\eta}$. Then $\boldsymbol{\eta}_{h}$ is proportional to $\boldsymbol{\eta}$, that is $\boldsymbol{\eta}_{h}=\kappa_{h} \times \boldsymbol{\eta}$ f

Figures (8)

  • Figure 1: Power curves under linear model \ref{['simulation generate data linear model']} with $p=200$ and $\boldsymbol{\beta}=\omega\ast\left(\mathbf{1}_{3},\mathbf{0}_{p-3}\right)^{\top}$. The "FAST_i", "FAST_ii", "FabTest_i" and "FabTest_ii" signify the results derived from the FAST in this paper and FabTest in fanJ2023 corresponding to settings \ref{['generate F model1']} and \ref{['generate F model2']} of $\boldsymbol{F}$ generation, respectively. The first row represent the results obtained with the original data, while the second row corresponds to the results of adding outliers. The first column shows the results when the error follows $\mathrm{N}(0, 0.25)$, while the second column exhibits the results when the error follows $\mathrm{t}_3$.
  • Figure 2: Power curves of linear model \ref{['simulation generate data linear model']} with $p=500$ and $\boldsymbol{\beta}=\omega\ast\left(\mathbf{1}_{3},\mathbf{0}_{p-3}\right)^{\top}$. The "FAST_i", "FAST_ii", "FabTest_i" and "FabTest_ii" signify the results derived from the FAST in this paper and FabTest in fanJ2023 corresponding to settings \ref{['generate F model1']} and \ref{['generate F model2']} of $\boldsymbol{F}$ generation, respectively. The first row represents the outcomes derived from the original data, while the second row corresponds to the results of adding outliers. The first column shows the outcomes assuming the error follows $\mathrm{N}(0, 0.25)$, while the second column exhibits the outcomes assuming the error follows $\mathrm{t}_3$.
  • Figure 3: Power curves of non-linear model \ref{['simulation generate data Non-linear model']} with $p=200$ and $\boldsymbol{\beta}=\omega\ast\left(\mathbf{1}_{3},\mathbf{0}_{p-3}\right)^{\top}$. The "FAST_i", "FAST_ii", "FabTest_i" and "FabTest_ii" signify the results derived from the FAST in this paper and FabTest in fanJ2023 corresponding to settings \ref{['generate F model1']} and \ref{['generate F model2']} of $\boldsymbol{F}$ generation, respectively. The first row represents the outcomes derived from the original data, while the second row corresponds to the results of adding outliers. The first column shows the outcomes assuming the error follows $\mathrm{N}(0, 0.25)$, while the second column exhibits the outcomes assuming the error follows $\mathrm{t}_3$.
  • Figure 4: Power curves of non-linear model \ref{['simulation generate data Non-linear model']} with $p=500$ and $\boldsymbol{\beta}=\omega\ast\left(\mathbf{1}_{3},\mathbf{0}_{p-3}\right)^{\top}$. The "FAST_i", "FAST_ii", "FabTest_i" and "FabTest_ii" signify the results derived from the FAST in this paper and FabTest in fanJ2023 corresponding to settings \ref{['generate F model1']} and \ref{['generate F model2']} of $\boldsymbol{F}$ generation, respectively. The first row represents the outcomes derived from the original data, while the second row corresponds to the results of adding outliers. The first column shows the outcomes assuming the error follows $\mathrm{N}(0, 0.25)$, while the second column exhibits the outcomes assuming the error follows $\mathrm{t}_3$.
  • Figure 5: The relative errors of $\widehat{\boldsymbol{\beta}}_{h}$ and $\widehat{\boldsymbol{\beta}}$. Figures (a), (b) and (c) depict the estimation results of linear model \ref{['simulation generate data linear model']} with noise $\varepsilon$ following $\mathrm{N}(0, 1)$, $\mathrm{Unif}(-{3}/{2}, {3}/{2})$ and $\mathrm{t}_3$, respectively. The "FASIM_Lasso", "SIM_Lasso" and "FA_Lasso" represent the relative errors of parameter ${\boldsymbol{\beta}}_{h}$ under FASIM in this paper, SIM without incorporating the factor effect in rejchel2020rank and the parameter ${\boldsymbol{\beta}}$ under FARM in fanJ2023, respectively.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Proposition 2.1
  • Remark 1
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 4.1
  • Theorem 4.2
  • proof
  • proof
  • proof
  • ...and 2 more