Variable Selection and Minimax Prediction in High-dimensional Functional Linear Model
Xingche Guo, Yehua Li, Tailen Hsing
TL;DR
The paper develops a RKHS-based functional elastic-net for high-dimensional functional linear models, where each predictor is an infinite-dimensional function. It proves existence and uniqueness of the estimator, derives non-asymptotic tail bounds for variable selection consistency under a functional irrepresentable condition, and demonstrates that a post-selection refined estimator achieves the oracle minimax prediction rate even when the number of true predictors grows with sample size. A scalable reduced-rank algorithm with a block-coordinate scheme is proposed and validated through extensive simulations and a real Human Connectome Project dataset, where 33 brain ROIs are consistently linked to fluid intelligence. The work advances theory and computation for ultra-high-dimensional functional regression, offering practical tools for variable selection and minimax-optimal prediction in infinite-dimensional settings.
Abstract
High-dimensional functional data have become increasingly prevalent in modern applications such as high-frequency financial data and neuroimaging data analysis. We investigate a class of high-dimensional linear regression models, where each predictor is a random element in an infinite-dimensional function space, and the number of functional predictors p can potentially be ultra-high. Assuming that each of the unknown coefficient functions belongs to some reproducing kernel Hilbert space (RKHS), we regularize the fitting of the model by imposing a group elastic-net type of penalty on the RKHS norms of the coefficient functions. We show that our loss function is Gateaux sub-differentiable, and our functional elastic-net estimator exists uniquely in the product RKHS. Under suitable sparsity assumptions and a functional version of the irrepresentable condition, we derive a non-asymptotic tail bound for variable selection consistency of our method. Allowing the number of true functional predictors $q$ to diverge with the sample size, we also show a post-selection refined estimator can achieve the oracle minimax optimal prediction rate. The proposed methods are illustrated through simulation studies and a real-data application from the Human Connectome Project.
