Table of Contents
Fetching ...

Empirical Gaussian Processes

Jihao Andreas Lin, Sebastian Ament, Louis C. Tiao, David Eriksson, Maximilian Balandat, Eytan Bakshy

TL;DR

Gaussian processes typically rely on handcrafted kernels, which limit flexibility and extrapolation. The paper introduces Empirical Gaussian Processes, which learn a nonparametric prior by estimating the mean $m_S$ and covariance $k_S$ from historical realizations, with a theoretical guarantee that the learned GP converges to the best Gaussian approximation of the true process in $D_{ m KL}$ sense. A closed-form EM algorithm handles learning from heterogeneous, non-grid observations, and a continuous-domain extension uses a latent reference grid with an interpolation mechanism and residual corrections to enable robust extrapolation. Empirical evaluations demonstrate competitive performance on learning-curve extrapolation and time-series forecasting benchmarks, often surpassing some neural baselines while offering substantial training efficiency.

Abstract

Gaussian processes (GPs) are powerful and widely used probabilistic regression models, but their effectiveness in practice is often limited by the choice of kernel function. This kernel function is typically handcrafted from a small set of standard functions, a process that requires expert knowledge, results in limited adaptivity to data, and imposes strong assumptions on the hypothesis space. We study Empirical GPs, a principled framework for constructing flexible, data-driven GP priors that overcome these limitations. Rather than relying on standard parametric kernels, we estimate the mean and covariance functions empirically from a corpus of historical observations, enabling the prior to reflect rich, non-trivial covariance structures present in the data. Theoretically, we show that the resulting model converges to the GP that is closest (in KL-divergence sense) to the real data generating process. Practically, we formulate the problem of learning the GP prior from independent datasets as likelihood estimation and derive an Expectation-Maximization algorithm with closed-form updates, allowing the model handle heterogeneous observation locations across datasets. We demonstrate that Empirical GPs achieve competitive performance on learning curve extrapolation and time series forecasting benchmarks.

Empirical Gaussian Processes

TL;DR

Gaussian processes typically rely on handcrafted kernels, which limit flexibility and extrapolation. The paper introduces Empirical Gaussian Processes, which learn a nonparametric prior by estimating the mean and covariance from historical realizations, with a theoretical guarantee that the learned GP converges to the best Gaussian approximation of the true process in sense. A closed-form EM algorithm handles learning from heterogeneous, non-grid observations, and a continuous-domain extension uses a latent reference grid with an interpolation mechanism and residual corrections to enable robust extrapolation. Empirical evaluations demonstrate competitive performance on learning-curve extrapolation and time-series forecasting benchmarks, often surpassing some neural baselines while offering substantial training efficiency.

Abstract

Gaussian processes (GPs) are powerful and widely used probabilistic regression models, but their effectiveness in practice is often limited by the choice of kernel function. This kernel function is typically handcrafted from a small set of standard functions, a process that requires expert knowledge, results in limited adaptivity to data, and imposes strong assumptions on the hypothesis space. We study Empirical GPs, a principled framework for constructing flexible, data-driven GP priors that overcome these limitations. Rather than relying on standard parametric kernels, we estimate the mean and covariance functions empirically from a corpus of historical observations, enabling the prior to reflect rich, non-trivial covariance structures present in the data. Theoretically, we show that the resulting model converges to the GP that is closest (in KL-divergence sense) to the real data generating process. Practically, we formulate the problem of learning the GP prior from independent datasets as likelihood estimation and derive an Expectation-Maximization algorithm with closed-form updates, allowing the model handle heterogeneous observation locations across datasets. We demonstrate that Empirical GPs achieve competitive performance on learning curve extrapolation and time series forecasting benchmarks.
Paper Structure (34 sections, 8 theorems, 32 equations, 8 figures, 3 tables)

This paper contains 34 sections, 8 theorems, 32 equations, 8 figures, 3 tables.

Key Result

Proposition 1

Assume that $k$ is continuous and that its canonical semi-metric satisfies Dudley's entropy integral condition. Then, for almost every sequence of sample paths $\{ f_i \}_{i=1}^S$, we have $\mathcal{GP}\bigl(m_S, k_S\bigr) \rightharpoonup \mathcal{GP}\bigl(m, k)$ as $S \to \infty$.

Figures (8)

  • Figure 1: Without human intervention, Empirical GP captures the behavior of kernels which were handcrafted by human experts. Left: On financial stock market data, Empirical GP matches the canonical geometric Brownian motion model. Right: On atmospheric climate data, Empirical GP infers seasonality and an upwards trend, achieving 21% lower RMSE than an expert-designed kernel rasmussen2024co2model.
  • Figure 2: Top: Sample paths from a variety of target GPs with distinct covariances functions are used as historical data for Empirical GP. Bottom: On a fixed set of new observations, the Empirical GP posterior converges to the posterior of the corresponding target GP, when using 256 sample paths from the latter as historical data, demonstrating data-driven adaptivity and convergence to the target GP.
  • Figure 3: Left: Convergence of the EM-estimated prior mean and covariance to the ground truth, as a function of the number of independent incomplete datasets (S), and for different numbers of observed data points per dataset (n). Right: The estimated covariance matrix ($S = 1024, n = 64$) follows the ground truth closely.
  • Figure 4: Left: The EM-inferred conditional distributions for each set of independent historical observations (points). The historical inputs are sampled uniformly from one of two continuous intervals: (-1, 0.2) and (-0.2, 1). The EM-inferred conditional distributions exhibit the periodic characteristics beyond the respective interval associated with a given set of historical observations, and revert gracefully to the base kernel (Matérn) beyond the range of any historical observations (-1, 1). Right: The posterior distribution associated with the EM-inferred GP prior (blue), compared to the ground truth posterior (dashed). The Empirical GP closely follows the ground truth within the interval of historical observations (-1, 1), and reverts back to the base kernel's behavior beyond.
  • Figure 5: Runtime and performance statistics of Empirical GP on the GIFT-Eval time series forecasting benchmark. Left: As the context length increases, SVD acceleration significantly decreases runtime to a virtually negligible amount. Left middle: Increasing the context length consistently improves the performance of Empirical GP relative to itself with a shorter context length. Right middle: SVD acceleration delivers consistent runtime improvements for varying amounts of historical data. Right: Increasing the amount of historical data tends to improve and stabilize the performance of Empirical GP relative to the Seasonal Naive baseline.
  • ...and 3 more figures

Theorems & Definitions (14)

  • Proposition 1
  • Proposition 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Proposition 2
  • proof
  • Corollary 1
  • proof
  • ...and 4 more