Table of Contents
Fetching ...

Selection of functional predictors and smooth coefficient estimation for scalar-on-function regression models

Hedayat Fathi, Marzia A. Cremona, Federico Severino

TL;DR

This paper tackles variable selection in scalar-on-function regression with many functional predictors by introducing SOFIA, an adaptive Lasso framework that enforces coefficient regularity by placing them in a Hilbert subspace $\mathbb{K}$ and solving a penalized least squares problem. A sieve-based finite-dimensional approximation on eigenfunctions of a trace-class operator $K$ enables scalable optimization via functional subgradients, and the authors prove a functional oracle property under a Hilbert-space restricted eigenvalue condition, with convergence rates tied to the sieve dimension $m$. Through extensive simulations and a real GDP-growth application, SOFIA demonstrates strong active-variable recovery and competitive or superior predictive performance relative to existing methods, while providing smooth, interpretable coefficient estimates. The work advances functional data analysis by combining variable selection, regularization in a RKHS-like setting, and solid theoretical guarantees, with practical coverage for high-dimensional functional data analysis.

Abstract

In the framework of scalar-on-function regression models, in which several functional variables are employed to predict a scalar response, we propose a methodology for selecting relevant functional predictors while simultaneously providing accurate smooth (or, more generally, regular) estimates of the functional coefficients. We suppose that the functional predictors belong to a real separable Hilbert space, while the functional coefficients belong to a specific subspace of this Hilbert space. Such a subspace can be a Reproducing Kernel Hilbert Space (RKHS) to ensure the desired regularity characteristics, such as smoothness or periodicity, for the coefficient estimates. Our procedure, called SOFIA (Scalar-On-Function Integrated Adaptive Lasso), is based on an adaptive penalized least squares algorithm that leverages functional subgradients to efficiently solve the minimization problem. We demonstrate that the proposed method satisfies the functional oracle property, even when the number of predictors exceeds the sample size. SOFIA's effectiveness in variable selection and coefficient estimation is evaluated through extensive simulation studies and a real-data application to GDP growth prediction.

Selection of functional predictors and smooth coefficient estimation for scalar-on-function regression models

TL;DR

This paper tackles variable selection in scalar-on-function regression with many functional predictors by introducing SOFIA, an adaptive Lasso framework that enforces coefficient regularity by placing them in a Hilbert subspace and solving a penalized least squares problem. A sieve-based finite-dimensional approximation on eigenfunctions of a trace-class operator enables scalable optimization via functional subgradients, and the authors prove a functional oracle property under a Hilbert-space restricted eigenvalue condition, with convergence rates tied to the sieve dimension . Through extensive simulations and a real GDP-growth application, SOFIA demonstrates strong active-variable recovery and competitive or superior predictive performance relative to existing methods, while providing smooth, interpretable coefficient estimates. The work advances functional data analysis by combining variable selection, regularization in a RKHS-like setting, and solid theoretical guarantees, with practical coverage for high-dimensional functional data analysis.

Abstract

In the framework of scalar-on-function regression models, in which several functional variables are employed to predict a scalar response, we propose a methodology for selecting relevant functional predictors while simultaneously providing accurate smooth (or, more generally, regular) estimates of the functional coefficients. We suppose that the functional predictors belong to a real separable Hilbert space, while the functional coefficients belong to a specific subspace of this Hilbert space. Such a subspace can be a Reproducing Kernel Hilbert Space (RKHS) to ensure the desired regularity characteristics, such as smoothness or periodicity, for the coefficient estimates. Our procedure, called SOFIA (Scalar-On-Function Integrated Adaptive Lasso), is based on an adaptive penalized least squares algorithm that leverages functional subgradients to efficiently solve the minimization problem. We demonstrate that the proposed method satisfies the functional oracle property, even when the number of predictors exceeds the sample size. SOFIA's effectiveness in variable selection and coefficient estimation is evaluated through extensive simulation studies and a real-data application to GDP growth prediction.

Paper Structure

This paper contains 15 sections, 5 theorems, 30 equations, 4 figures, 3 tables.

Key Result

Lemma 1

Let $\boldsymbol{X} = (X_{ij}) \in \mathbb{H}^{n\times p}$ and $\boldsymbol{\varepsilon} = (\varepsilon_1, \cdots, \varepsilon_n)$ be as in Assumption Assumption 1. For each $j = 1, \ldots, p$, define $Z_j := n^{-1} \sum_{i=1}^n \varepsilon_i X_{ij}$. Then, for any $t > 0$, where $M > 0$ is a constant depending only on the constants $C_1$ and $C_2$ in Assumption Assumption 1.

Figures (4)

  • Figure 1: The coefficient functions $\beta_j(t)$ of the active predictors for (a) $p_0=5$ and (b) $p_0=10$.
  • Figure 2: Comparison of the performance of different kernels under varying parameter values for SNR = 1. Subfigures represent the true positives, false positives, and root mean squared error (RMSE) in the very sparse and less sparse regimes, in the high-dimensional scale ($p=700$).
  • Figure 3: Comparison of the performance of different kernels under varying parameter values for SNR = 10. Subfigures represent the true positives, false positives, and root mean squared error (RMSE) in the very sparse and less sparse regimes, in the high-dimensional scale ($p=700$).
  • Figure 4: Estimated coefficients ($\hat{\boldsymbol{\beta}}$) for the three selected predictors for GDP growth.

Theorems & Definitions (5)

  • Lemma 1: Hilbert-valued concentration inequality
  • Theorem 1
  • Corollary 1
  • Theorem 2: Functional oracle property
  • Corollary 2