Table of Contents
Fetching ...

Variable Selection and Minimax Prediction in High-dimensional Functional Linear Model

Xingche Guo, Yehua Li, Tailen Hsing

TL;DR

The paper develops a RKHS-based functional elastic-net for high-dimensional functional linear models, where each predictor is an infinite-dimensional function. It proves existence and uniqueness of the estimator, derives non-asymptotic tail bounds for variable selection consistency under a functional irrepresentable condition, and demonstrates that a post-selection refined estimator achieves the oracle minimax prediction rate even when the number of true predictors grows with sample size. A scalable reduced-rank algorithm with a block-coordinate scheme is proposed and validated through extensive simulations and a real Human Connectome Project dataset, where 33 brain ROIs are consistently linked to fluid intelligence. The work advances theory and computation for ultra-high-dimensional functional regression, offering practical tools for variable selection and minimax-optimal prediction in infinite-dimensional settings.

Abstract

High-dimensional functional data have become increasingly prevalent in modern applications such as high-frequency financial data and neuroimaging data analysis. We investigate a class of high-dimensional linear regression models, where each predictor is a random element in an infinite-dimensional function space, and the number of functional predictors p can potentially be ultra-high. Assuming that each of the unknown coefficient functions belongs to some reproducing kernel Hilbert space (RKHS), we regularize the fitting of the model by imposing a group elastic-net type of penalty on the RKHS norms of the coefficient functions. We show that our loss function is Gateaux sub-differentiable, and our functional elastic-net estimator exists uniquely in the product RKHS. Under suitable sparsity assumptions and a functional version of the irrepresentable condition, we derive a non-asymptotic tail bound for variable selection consistency of our method. Allowing the number of true functional predictors $q$ to diverge with the sample size, we also show a post-selection refined estimator can achieve the oracle minimax optimal prediction rate. The proposed methods are illustrated through simulation studies and a real-data application from the Human Connectome Project.

Variable Selection and Minimax Prediction in High-dimensional Functional Linear Model

TL;DR

The paper develops a RKHS-based functional elastic-net for high-dimensional functional linear models, where each predictor is an infinite-dimensional function. It proves existence and uniqueness of the estimator, derives non-asymptotic tail bounds for variable selection consistency under a functional irrepresentable condition, and demonstrates that a post-selection refined estimator achieves the oracle minimax prediction rate even when the number of true predictors grows with sample size. A scalable reduced-rank algorithm with a block-coordinate scheme is proposed and validated through extensive simulations and a real Human Connectome Project dataset, where 33 brain ROIs are consistently linked to fluid intelligence. The work advances theory and computation for ultra-high-dimensional functional regression, offering practical tools for variable selection and minimax-optimal prediction in infinite-dimensional settings.

Abstract

High-dimensional functional data have become increasingly prevalent in modern applications such as high-frequency financial data and neuroimaging data analysis. We investigate a class of high-dimensional linear regression models, where each predictor is a random element in an infinite-dimensional function space, and the number of functional predictors p can potentially be ultra-high. Assuming that each of the unknown coefficient functions belongs to some reproducing kernel Hilbert space (RKHS), we regularize the fitting of the model by imposing a group elastic-net type of penalty on the RKHS norms of the coefficient functions. We show that our loss function is Gateaux sub-differentiable, and our functional elastic-net estimator exists uniquely in the product RKHS. Under suitable sparsity assumptions and a functional version of the irrepresentable condition, we derive a non-asymptotic tail bound for variable selection consistency of our method. Allowing the number of true functional predictors to diverge with the sample size, we also show a post-selection refined estimator can achieve the oracle minimax optimal prediction rate. The proposed methods are illustrated through simulation studies and a real-data application from the Human Connectome Project.
Paper Structure (24 sections, 28 theorems, 210 equations, 7 figures, 4 tables)

This paper contains 24 sections, 28 theorems, 210 equations, 7 figures, 4 tables.

Key Result

Proposition 1

Suppose that Condition C.ass:a1 holds. Then, for each $j=1,\ldots,p$, any minimizer $\widehat{f_j}$ of equ:mini1 must be in the space $\mathbb{M}_{nj}$.

Figures (7)

  • Figure 1: Simulation Scenario I: The ROC curves of fEnet and FLR-SCAD under the ultra-high-dimension setting $(n, p, q) = (100, 200, 10)$. The ROC curves are obtained by changing the value of $\lambda$ and holding other hyperparameters at optimal.
  • Figure 1: Simulation Scenario I I: the ROC curves of fEnet and FLR-SCAD under the ultra high-dimensional case. The ROC curves are obtained by changing the value of $\lambda$ and holding other hyperparameters as optimal.
  • Figure 2: Simulation Scenario I: The plots of FPR, FNR, and RER versus $\log_{10}(1-\alpha)$ for different values of $\theta$ under the ultra-high-dimensional case and $\rho=0.75$.
  • Figure 2: Simulation Scenario I I: the plots of FPR, FNR, and RER versus $\log_{10}(1-\alpha)$ for different values of $\theta$ under the ultra high-dimensional case.
  • Figure 3: The orthographic projections of a brain (light blue), where the 33 selected ROIs using the HCP data are marked in dark blue.
  • ...and 2 more figures

Theorems & Definitions (56)

  • Remark 1
  • Remark 2
  • Proposition 1
  • Proposition 2
  • Remark 3
  • Theorem 1
  • Remark 4
  • Theorem 2
  • Corollary 1
  • Remark 5
  • ...and 46 more