Table of Contents
Fetching ...

Adaptive RKHS Fourier Features for Compositional Gaussian Process Models

Xinxing Shi, Thomas Baldwin-McDonald, Mauricio A. Álvarez

TL;DR

This paper introduces Ordinary Differential Equation--based RKHS Fourier features that allow for adaptive amplitude and phase modulation through convolution operations and embeds these adjustable RKHS Fourier features within a doubly stochastic variational inference framework, which exhibits improved predictive performance across various regression tasks.

Abstract

Deep Gaussian Processes (DGPs) leverage a compositional structure to model non-stationary processes. DGPs typically rely on local inducing point approximations across intermediate GP layers. Recent advances in DGP inference have shown that incorporating global Fourier features from the Reproducing Kernel Hilbert Space (RKHS) can enhance the DGPs' capability to capture complex non-stationary patterns. This paper extends the use of these features to compositional GPs involving linear transformations. In particular, we introduce Ordinary Differential Equation(ODE)--based RKHS Fourier features that allow for adaptive amplitude and phase modulation through convolution operations. This convolutional formulation relates our work to recently proposed deep latent force models, a multi-layer structure designed for modelling nonlinear dynamical systems. By embedding these adjustable RKHS Fourier features within a doubly stochastic variational inference framework, our model exhibits improved predictive performance across various regression tasks.

Adaptive RKHS Fourier Features for Compositional Gaussian Process Models

TL;DR

This paper introduces Ordinary Differential Equation--based RKHS Fourier features that allow for adaptive amplitude and phase modulation through convolution operations and embeds these adjustable RKHS Fourier features within a doubly stochastic variational inference framework, which exhibits improved predictive performance across various regression tasks.

Abstract

Deep Gaussian Processes (DGPs) leverage a compositional structure to model non-stationary processes. DGPs typically rely on local inducing point approximations across intermediate GP layers. Recent advances in DGP inference have shown that incorporating global Fourier features from the Reproducing Kernel Hilbert Space (RKHS) can enhance the DGPs' capability to capture complex non-stationary patterns. This paper extends the use of these features to compositional GPs involving linear transformations. In particular, we introduce Ordinary Differential Equation(ODE)--based RKHS Fourier features that allow for adaptive amplitude and phase modulation through convolution operations. This convolutional formulation relates our work to recently proposed deep latent force models, a multi-layer structure designed for modelling nonlinear dynamical systems. By embedding these adjustable RKHS Fourier features within a doubly stochastic variational inference framework, our model exhibits improved predictive performance across various regression tasks.
Paper Structure (38 sections, 34 equations, 10 figures, 12 tables)

This paper contains 38 sections, 34 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Covariance functions of LFMs (left) and Variational Fourier Response Features (VFRFs) (right). The latent force $u(t)$ uses a Matérn-$\tfrac{1}{2}$ kernel with length-scale $l=0.2$ (left dashed). Left: The centred kernel of the input latent force (dashed) and the output process $f(t)$ of LFMs with different ODE parameters $\gamma$ (solid). Unlike the LFM kernel induced by \ref{['eq:lfm-equation']} (green), the modified LFM kernel from \ref{['eq:ode-ab']} can revert to the input Matérn-$\tfrac{1}{2}$ kernel if increasing $\gamma$ (red to brown). Right: VFRFs ($G\circ\phi$, red solid) and VFFs ($\phi$, blue dashed) with different inducing frequencies: $z_m=\tfrac{8\pi}{b-a}$ (upper) and $\tfrac{28\pi}{b-a}$ (lower). The upper panel depicts the cosine basis with a phase delay $\theta\approx\tfrac{\pi}{4}$ to the VFF, while the lower panel displays the sine basis with a phase delay $\theta\approx\frac{5\pi}{12}$.
  • Figure 2: A conceptual illustration of how our model (\ref{['fig:subfig-dlfm']}) differs from the IDDGP (\ref{['fig:subfig-idgp']}) and the DLFM-RFF (\ref{['fig:subfig-dlfm-rff']}). Compared to (\ref{['fig:subfig-idgp']}), our model additionally applies convolution operators $G$ from the ODEs to each input dimension: $f(t)=\int G(t-\tau)u(\tau)\mathop{}\!\mathrm{d}\tau$, where $G(\cdot)$ represents the Green's function and $u(\cdot)$ is a GP prior with Matérn kernels. Compared to (\ref{['fig:subfig-dlfm-rff']}) using RFFs $\varphi(\cdot)$ for low-rank covariance matrix approximation and making inference over weights $W$, our model uses Fourier features derived from applying linear transformations to GPs and make inference in an inter-domain way. For a high-level comparison with other models, see Fig. \ref{['fig:model-comparison']}.
  • Figure 3: Illustrative example of Matérn-$\tfrac{1}{2}$ LFM posteriors with VFRFs / RFFs. The model's feature is indicated at the lower right. Top row: predictive posteriors of 20, 80, and 500 RFFs. Bottom row: predictive posteriors of 20 and 80 inducing frequencies and an exact LFM. Noisy observations are marked with red dots, posterior predictive means with blue lines, and uncertainty (one or two standard deviations) with varying shades of blue. In this example, VFRFs show a better approximation to the true posterior, whereas RFFs indicate variance underestimation with fewer features.
  • Figure 4: Posterior predictive distribution comparison of different models on data points from a noisy multi-step function. The models and the features used are noted at the bottom right of each subplot. The dashed lines are samples from the predictive distributions. The experiment uses two layers for deep models and Matérn-$\tfrac{3}{2}$ kernels except for the DGP (upper left) and DLFM-RFF (lower left) that use RBF kernels. All models are trained with 20 inducing points/Fourier features per layer. The DLFM models with VFRFs perform best among the models.
  • Figure 5: (a) Learning progression of DLFMs and IDDGPs with $M$ inducing frequencies on the TIMIT dataset, presented in negative ELBO, test average RMSE and NMLL. The DLFM in yellow maintains fixed $\beta=10^{-6}$ throughout the first 2000 training iterations, after which $\alpha/\beta$ are allowed to vary. The DLFMs in red employ trainable ODE parameters from the start. The DLFM-VFRFs consistently outperform the IDDGPs; (b) Mean standardised RMSE and NMLL with the standard deviations (over 10 random seeds) for models employing varying numbers of inducing frequencies. The numbers following the hyphen in the y-axis labels indicate the number of inducing frequencies/points. A lower value (to the left) indicates better performance.
  • ...and 5 more figures