Table of Contents
Fetching ...

Scalable mixed-domain Gaussian process modeling and model reduction for longitudinal data

Juho Timonen, Harri Lähdesmäki

TL;DR

This work tackles the computational bottlenecks of Gaussian process modeling with mixed continuous and categorical inputs in longitudinal data. It introduces a basis-function, Hilbert-space approximation that yields $O(NM)$ complexity and supports arbitrary observation models, enabling full Bayesian inference for additively decomposed mixed-domain kernels. The authors develop a practical model-reduction workflow combining additive variance decomposition and projection predictive selection, demonstrating effective reduction of model size while preserving predictive performance. Through simulations and real-data experiments (e.g., Canadian weather and US elections), the method achieves near-exact accuracy at a fraction of the runtime and provides interpretable, category-specific and shared effects. An open-source R package lgpr2 accompanies the approach, making scalable, interpretable mixed-domain GP modeling accessible for large longitudinal datasets.

Abstract

Gaussian process (GP) models that combine both categorical and continuous input variables have found use in analysis of longitudinal data and computer experiments. However, standard inference for these models has the typical cubic scaling, and common scalable approximation schemes for GPs cannot be applied since the covariance function is non-continuous. In this work, we derive a basis function approximation scheme for mixed-domain covariance functions, which scales linearly with respect to the number of observations and total number of basis functions. The proposed approach is naturally applicable to also Bayesian GP regression with discrete observation models. We demonstrate the scalability of the approach and compare model reduction techniques for additive GP models in a longitudinal data context. We confirm that we can approximate the exact GP model accurately in a fraction of the runtime compared to fitting the corresponding exact model. In addition, we demonstrate a scalable model reduction workflow for obtaining smaller and more interpretable models when dealing with a large number of candidate predictors.

Scalable mixed-domain Gaussian process modeling and model reduction for longitudinal data

TL;DR

This work tackles the computational bottlenecks of Gaussian process modeling with mixed continuous and categorical inputs in longitudinal data. It introduces a basis-function, Hilbert-space approximation that yields complexity and supports arbitrary observation models, enabling full Bayesian inference for additively decomposed mixed-domain kernels. The authors develop a practical model-reduction workflow combining additive variance decomposition and projection predictive selection, demonstrating effective reduction of model size while preserving predictive performance. Through simulations and real-data experiments (e.g., Canadian weather and US elections), the method achieves near-exact accuracy at a fraction of the runtime and provides interpretable, category-specific and shared effects. An open-source R package lgpr2 accompanies the approach, making scalable, interpretable mixed-domain GP modeling accessible for large longitudinal datasets.

Abstract

Gaussian process (GP) models that combine both categorical and continuous input variables have found use in analysis of longitudinal data and computer experiments. However, standard inference for these models has the typical cubic scaling, and common scalable approximation schemes for GPs cannot be applied since the covariance function is non-continuous. In this work, we derive a basis function approximation scheme for mixed-domain covariance functions, which scales linearly with respect to the number of observations and total number of basis functions. The proposed approach is naturally applicable to also Bayesian GP regression with discrete observation models. We demonstrate the scalability of the approach and compare model reduction techniques for additive GP models in a longitudinal data context. We confirm that we can approximate the exact GP model accurately in a fraction of the runtime compared to fitting the corresponding exact model. In addition, we demonstrate a scalable model reduction workflow for obtaining smaller and more interpretable models when dealing with a large number of candidate predictors.

Paper Structure

This paper contains 32 sections, 37 equations, 6 figures.

Figures (6)

  • Figure 1: The posterior predictive mean for four different approximate models in one replication of Experiment 1. The models differ only based on the number of basis functions $B$ and domain scaling factor $c$. The yellow line shows the posterior predictive mean for the corresponding exact GP model as a reference. Black dots are training data and red crosses test data.
  • Figure 2: Mean log predictive density for test data in Experiment 1 for varying training data set sizes $N_{\text{train}}$ and domain scales $c$. Black dashed line corresponds to the exact model. Results are averages over 30 replications of the experiment.
  • Figure 3: Runtimes of fitting the exact and approximate models in Experiment 1. a) Exact model vs. approximation with $B=64$. b) Approximations using different values for $B$. The markers show the average time taken to run a chain, when a total of $30 \times 4$ MCMC chains were run for 2000 iterations each. The vertical error bars show $\pm$ one standard deviation (not shown for approximate model in panel a. Note the smaller y-axis scale in panel b. We see empirically that the runtime of the approximate model scales linearly in both $N$ and $B$.
  • Figure 4: Results for the Canadian weather data experiment with $B=32$ and $c=1.5$. Panels a-c show the marginal posterior distribution of each of the three components $f^{(1)}$, $f^{(2)}$ and $f^{(3)}$ (mean $\pm$ two times standard deviation). Standard deviation is not shown for $f^{(3)}$ for clarity. The functions are on the standardized scale (response variable normalized to zero mean and unit variance). We see for example that the regions tend to have larger differences during winter. Notice that due to the specified categorical correlation structure, the effects of each region ($f^{(2)}$) sum to zero at each time point, and so do also the effects specific to each station ($f^{(3)}$). The sum-to-zero constraint disentangles the region or station specific time effects from the shared time effects.
  • Figure 5: Results for the US election prediction experiment with $B=24$ and $c=2.0$. Figures a)-c) show the marginal posterior distribution of each of the three components $f^{(1)}$, $f^{(2)}$ and $f^{(3)}$ (mean $\pm$ two times standard deviation). Standard deviation is not shown for $f^{(2)}$ and $f^{(3)}$ for clarity.
  • ...and 1 more figures