Table of Contents
Fetching ...

Fast Gaussian Process Approximations for Autocorrelated Data

Ahmadreza Chokhachian, Matthias Katzfuss, Yu Ding

TL;DR

This work tackles the challenge of scaling Gaussian process regression to autocorrelated data by combining tempGP's data thinning with three fast GP approximations: thinned twinGP, thinned laGP, and thinned SV (scaled Vecchia). By partitioning data into decorrelated blocks and tailoring each approximation to this structure, the authors achieve substantial computational speedups while maintaining prediction accuracy, with thinned SV often providing the best overall performance and thinned twinGP offering the fastest inference. They demonstrate the approach across robotics, satellite drag, and wind-energy datasets, showing that temporal overfitting is mitigated and that speedups can be orders of magnitude relative to full tempGP. The results suggest strong practical applicability for real-time or large-scale autocorrelated domains, such as wind farms, climate monitoring, and remote sensing.

Abstract

This paper is concerned with the problem of how to speed up computation for Gaussian process models trained on autocorrelated data. The Gaussian process model is a powerful tool commonly used in nonlinear regression applications. Standard regression modeling assumes random samples and an independently, identically distributed noise. Various fast approximations that speed up Gaussian process regression work under this standard setting. But for autocorrelated data, failing to account for autocorrelation leads to a phenomenon known as temporal overfitting that deteriorates model performance on new test instances. To handle autocorrelated data, existing fast Gaussian process approximations have to be modified; one such approach is to segment the originally correlated data points into blocks in which the blocked data are de-correlated. This work explains how to make some of the existing Gaussian process approximations work with blocked data. Numerical experiments across diverse application datasets demonstrate that the proposed approaches can remarkably accelerate computation for Gaussian process regression on autocorrelated data without compromising model prediction performance.

Fast Gaussian Process Approximations for Autocorrelated Data

TL;DR

This work tackles the challenge of scaling Gaussian process regression to autocorrelated data by combining tempGP's data thinning with three fast GP approximations: thinned twinGP, thinned laGP, and thinned SV (scaled Vecchia). By partitioning data into decorrelated blocks and tailoring each approximation to this structure, the authors achieve substantial computational speedups while maintaining prediction accuracy, with thinned SV often providing the best overall performance and thinned twinGP offering the fastest inference. They demonstrate the approach across robotics, satellite drag, and wind-energy datasets, showing that temporal overfitting is mitigated and that speedups can be orders of magnitude relative to full tempGP. The results suggest strong practical applicability for real-time or large-scale autocorrelated domains, such as wind farms, climate monitoring, and remote sensing.

Abstract

This paper is concerned with the problem of how to speed up computation for Gaussian process models trained on autocorrelated data. The Gaussian process model is a powerful tool commonly used in nonlinear regression applications. Standard regression modeling assumes random samples and an independently, identically distributed noise. Various fast approximations that speed up Gaussian process regression work under this standard setting. But for autocorrelated data, failing to account for autocorrelation leads to a phenomenon known as temporal overfitting that deteriorates model performance on new test instances. To handle autocorrelated data, existing fast Gaussian process approximations have to be modified; one such approach is to segment the originally correlated data points into blocks in which the blocked data are de-correlated. This work explains how to make some of the existing Gaussian process approximations work with blocked data. Numerical experiments across diverse application datasets demonstrate that the proposed approaches can remarkably accelerate computation for Gaussian process regression on autocorrelated data without compromising model prediction performance.

Paper Structure

This paper contains 15 sections, 13 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: The thinning mechanism
  • Figure 2: Applying twinGP to thinned data via model averaging.
  • Figure 3: The mechanism for forming the conditioning set in the thinned scaled Vecchia
  • Figure 4: The left plot displays the PACF for the LHS dataset, which shows no significant autocorrelation. The middle and right plots use different lag values, $\mathbf{M=13}$ and $\mathbf{M=24}$, respectively. The corresponding thinning numbers are 2, 15, and 23. In both the middle and right plots, partial autocorrelation persists over moderate and longer lags, indicating stronger temporal dependence.
  • Figure 5: The effect of increasing the thinning number on the out-of-sample RMSE. The 25 highest values are colored in red, and the 25 lowest values are in green.
  • ...and 1 more figures