Table of Contents
Fetching ...

Variational Inference for Variable Selection in Scalar-on-Function Regression

Ana Carolina da Cruz, Camila P. E. de Souza, Pedro H. T. O. Sousa

TL;DR

This work develops a variational inference approach for estimation and variable selection in scalar-on-function regression, involving only functional covariates, and in partially functional regression models that also include scalar covariates.

Abstract

In practical regression applications, multiple covariates are often measured, but not all may be associated with the response variable. Identifying and including only the relevant covariates in the model is crucial for improving prediction accuracy. In this work, we develop a variational inference approach for estimation and variable selection in scalar-on-function regression, involving only functional covariates, and in partially functional regression models that also include scalar covariates. Specifically, we develop a variational expectation-maximization (VEM) algorithm, with a variational Bayes procedure implemented in the E-step to obtain approximate marginal posterior distributions for most model parameters, except for the regularization parameters, which are updated in the M-step. Our method accurately identifies relevant covariates while maintaining strong predictive performance, as demonstrated through extensive simulation studies across diverse scenarios. Compared with alternative approaches, including BGLSS (Bayesian Group Lasso with Spike-and-Slab priors), grLASSO (group Least Absolute Shrinkage and Selection Operator), grMCP (group Minimax Concave Penalty), and grSCAD (group Smoothly Clipped Absolute Deviation), our approach achieves a superior balance between goodness-of-fit and sparsity in most scenarios. We further illustrate its practical utility through real-data applications involving spectral analysis of sugar samples and weather measurements from Japan.

Variational Inference for Variable Selection in Scalar-on-Function Regression

TL;DR

This work develops a variational inference approach for estimation and variable selection in scalar-on-function regression, involving only functional covariates, and in partially functional regression models that also include scalar covariates.

Abstract

In practical regression applications, multiple covariates are often measured, but not all may be associated with the response variable. Identifying and including only the relevant covariates in the model is crucial for improving prediction accuracy. In this work, we develop a variational inference approach for estimation and variable selection in scalar-on-function regression, involving only functional covariates, and in partially functional regression models that also include scalar covariates. Specifically, we develop a variational expectation-maximization (VEM) algorithm, with a variational Bayes procedure implemented in the E-step to obtain approximate marginal posterior distributions for most model parameters, except for the regularization parameters, which are updated in the M-step. Our method accurately identifies relevant covariates while maintaining strong predictive performance, as demonstrated through extensive simulation studies across diverse scenarios. Compared with alternative approaches, including BGLSS (Bayesian Group Lasso with Spike-and-Slab priors), grLASSO (group Least Absolute Shrinkage and Selection Operator), grMCP (group Minimax Concave Penalty), and grSCAD (group Smoothly Clipped Absolute Deviation), our approach achieves a superior balance between goodness-of-fit and sparsity in most scenarios. We further illustrate its practical utility through real-data applications involving spectral analysis of sugar samples and weather measurements from Japan.
Paper Structure (28 sections, 44 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 28 sections, 44 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: Simulation Study 1. Mean estimated curves for the two partial functional coefficients (blue), individual estimates per simulated dataset (grey), and boxplots of the estimated intercept across 100 simulated datasets, for sample size $n = 200$ and varying error variance ($\sigma^2 = 0.1, 0.5$). True values for the partial functional coefficients and intercept are shown in red.
  • Figure 2: Simulation Study 1. Estimated curves for the non-zero partial functional coefficient (blue) with 95% credible bands (black) for one of the simulated datasets under the scenario with sample size $n = 200$ and error variances $\sigma^2 = 0.1, 0.5$. The true values for the partial functional coefficients are shown in red.
  • Figure 3: Simulation Study 2. Mean estimated curves for the non-zero partial functional coefficients obtained from our method (blue) for sample size $n = 400$ and varying error variance ($\sigma^2 = 0.01$ for plots on the left, and $\sigma^2 = 0.05$ for plots on the right). The true values of the partial functional coefficients are shown in red and individual estimates curves across the 100 simulated datasets obtained from our method are provided in grey.
  • Figure 4: Simulation Study 2. Estimated curves for the non-zero partial functional coefficients from our VEM algorithm (blue) for one of the simulated datasets under the scenario with sample size $n = 400$ and error variances ($\sigma^2 = 0.01$ for plots on the left, and $\sigma^2 = 0.05$ for plots on the right). A 95% credible band (dotted curves) was obtained for our method. The true partial functional coefficients are shown in red.
  • Figure 5: Simulation Study 3. Mean estimated curves (blue) for the non-zero partial functional coefficient, individual estimated curves for each simulated dataset (grey) and boxplot of the estimated intercept across 100 simulated datasets, for sample size $n = 100$ and varying error variance ($\sigma^2 = 0.1$ for plots on the left, and $\sigma^2 = 0.5$ for plots on the right). The true values of the partial functional coefficient and intercept are shown in red.
  • ...and 3 more figures