Table of Contents
Fetching ...

Bayesian Kernel Regression for Functional Data

Minoru Kusaba, Megumi Iwayama, Ryo Yoshida

TL;DR

This work addresses functional output regression by introducing KRFD, a kernel-based model that predicts functions $Y(X,t)$ from covariates $X$ while leveraging the covariance structure across the output domain via a separable kernel RKHS. The method expresses $Y(X,t)$ as a kernel expansion in $t$ with $X$-dependent weights, leading to a linear-in-parameters form that admits analytic Bayesian estimation and predictive uncertainty through Gaussian posteriors in the RKHS. A sparse-data extension (KRSFD) broadens applicability to irregularly sampled functions, and a representer-theorem perspective links KRFD to multitask learning with separable kernels, while preserving aBayesian treatment of uncertainty not common in classic MTL. Empirical results on dense artificial data, sparse artificial data, and density-of-states predictions for materials demonstrate that KRFD consistently outperforms FLM and sometimes KRR, with KRFD uniquely providing a predictive distribution for new inputs. The work highlights practical pathways for scalable kernel-based functional regression and underscores the method’s potential for materials science and other domains requiring uncertainty-aware functional predictions.

Abstract

In supervised learning, the output variable to be predicted is often represented as a function, such as a spectrum or probability distribution. Despite its importance, functional output regression remains relatively unexplored. In this study, we propose a novel functional output regression model based on kernel methods. Unlike conventional approaches that independently train regressors with scalar outputs for each measurement point of the output function, our method leverages the covariance structure within the function values, akin to multitask learning, leading to enhanced learning efficiency and improved prediction accuracy. Compared with existing nonlinear function-on-scalar models in statistical functional data analysis, our model effectively handles high-dimensional nonlinearity while maintaining a simple model structure. Furthermore, the fully kernel-based formulation allows the model to be expressed within the framework of reproducing kernel Hilbert space (RKHS), providing an analytic form for parameter estimation and a solid foundation for further theoretical analysis. The proposed model delivers a functional output predictive distribution derived analytically from a Bayesian perspective, enabling the quantification of uncertainty in the predicted function. We demonstrate the model's enhanced prediction performance through experiments on artificial datasets and density of states prediction tasks in materials science.

Bayesian Kernel Regression for Functional Data

TL;DR

This work addresses functional output regression by introducing KRFD, a kernel-based model that predicts functions from covariates while leveraging the covariance structure across the output domain via a separable kernel RKHS. The method expresses as a kernel expansion in with -dependent weights, leading to a linear-in-parameters form that admits analytic Bayesian estimation and predictive uncertainty through Gaussian posteriors in the RKHS. A sparse-data extension (KRSFD) broadens applicability to irregularly sampled functions, and a representer-theorem perspective links KRFD to multitask learning with separable kernels, while preserving aBayesian treatment of uncertainty not common in classic MTL. Empirical results on dense artificial data, sparse artificial data, and density-of-states predictions for materials demonstrate that KRFD consistently outperforms FLM and sometimes KRR, with KRFD uniquely providing a predictive distribution for new inputs. The work highlights practical pathways for scalable kernel-based functional regression and underscores the method’s potential for materials science and other domains requiring uncertainty-aware functional predictions.

Abstract

In supervised learning, the output variable to be predicted is often represented as a function, such as a spectrum or probability distribution. Despite its importance, functional output regression remains relatively unexplored. In this study, we propose a novel functional output regression model based on kernel methods. Unlike conventional approaches that independently train regressors with scalar outputs for each measurement point of the output function, our method leverages the covariance structure within the function values, akin to multitask learning, leading to enhanced learning efficiency and improved prediction accuracy. Compared with existing nonlinear function-on-scalar models in statistical functional data analysis, our model effectively handles high-dimensional nonlinearity while maintaining a simple model structure. Furthermore, the fully kernel-based formulation allows the model to be expressed within the framework of reproducing kernel Hilbert space (RKHS), providing an analytic form for parameter estimation and a solid foundation for further theoretical analysis. The proposed model delivers a functional output predictive distribution derived analytically from a Bayesian perspective, enabling the quantification of uncertainty in the predicted function. We demonstrate the model's enhanced prediction performance through experiments on artificial datasets and density of states prediction tasks in materials science.

Paper Structure

This paper contains 18 sections, 41 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Schematic of the KRFD models, showing how the function output $Y(X, t)$ is modeled. The bell-shaped functions represent the kernel functions $k_{T}(\cdot, \cdot)$, centered at the measurement points $t_1, \ldots, t_L$. Their weights $c_l(X)$ are $X$-dependent functions modeled by kernel regressors. The intercept term $\mu(X)$ is included to capture input-dependent shifts independent of $t$.
  • Figure 2: Prediction results for the test samples of the dense artificial data using the (a) KRFD, (b) KRRs, and (c) FLM models. The scatter plots and histograms are color-coded in five different colors corresponding to the five independent data splitting.
  • Figure 3: Functional prediction results for the test samples. The gray line shows the true data before adding observation noise, the red dots show the observed data points, and the light blue line shows the functional predictions. The title of each figure indicates the actual amplitude, frequency, phase, slope, and intercept values for that input. The first through third rows show the predictions using the KRFD model. The first row shows only the mean of the predictive distribution, the second row shows the predicted mean $\pm1$ standard deviation (blue band) of the predictive distribution, and the third row shows the results of sampling the function 300 times from the predictive distribution. The fourth and fifth rows show the predicted results for KRRs and FLM, respectively.
  • Figure 4: Prediction results for the test samples of the sparse artificial data using the (a) KRSFD and (b) FLM models. The scatter plots and histograms are color-coded in five different colors corresponding to the five independent data splitting.
  • Figure 5: Functional prediction results for the test samples. The gray line shows the true data before adding observation noise, the red dots show the observed data points, and the light blue line shows the functional predictions. The title of each figure indicates the actual amplitude, frequency, phase, slope, and intercept values for that input. The first through third rows show the predictions using the KRSFD model. The first row shows only the mean of the predictive distribution, the second row shows the predicted mean $\pm1$ standard deviation (blue band) of the predictive distribution, and the third row shows the results of sampling the function 300 times from the predictive distribution. The fourth row shows the predictions using the FLM model.
  • ...and 7 more figures