Table of Contents
Fetching ...

Fast Multitask Gaussian Process Regression

Aleksei G. Sorokin, Pieterjan Robbe, Fred J. Hickernell

Abstract

Gaussian process (GP) regression is a powerful probabilistic modeling technique with built-in uncertainty quantification. When one has access to multiple correlated simulations (tasks), it is common to fit a multitask GP (MTGP) surrogate which is capable of capturing both inter-task and intra-task correlations. However, with a total of $N$ evaluations across all tasks, fitting an MTGP is often infeasible due to the $\mathcal{O}(N^2)$ storage and $\mathcal{O}(N^3)$ computations required to store, solve a linear system in, and compute the determinant of the $N \times N$ Gram matrix of pairwise kernel evaluations. In the single-task setting, one may reduce the required storage to $\mathcal{O}(N)$ and computations to $\mathcal{O}(N \log N)$ by fitting "fast GPs" which pair low-discrepancy design points from quasi-Monte Carlo to special kernel forms which yields nicely structured Gram matrices, e.g., circulant matrices. This article generalizes fast GPs to fast MTGPs which pair low-discrepancy design points for each task to special product kernel forms which yields nicely structured block Gram matrices, e.g., circulant block matrices. An algorithm is presented to efficiently store, invert, and compute the determinant of such Gram matrices with optionally different sampling nodes and different sample sizes for each task. Derivations for fast MTGP Bayesian cubature are also provided. A GPU-compatible, open-source Python implementation is made available in the FastGPs package (https://alegresor.github.io/fastgps/). We validate the efficiency of our algorithm and implementation compared to standard techniques on a range of problems with low numbers of tasks and large sample sizes.

Fast Multitask Gaussian Process Regression

Abstract

Gaussian process (GP) regression is a powerful probabilistic modeling technique with built-in uncertainty quantification. When one has access to multiple correlated simulations (tasks), it is common to fit a multitask GP (MTGP) surrogate which is capable of capturing both inter-task and intra-task correlations. However, with a total of evaluations across all tasks, fitting an MTGP is often infeasible due to the storage and computations required to store, solve a linear system in, and compute the determinant of the Gram matrix of pairwise kernel evaluations. In the single-task setting, one may reduce the required storage to and computations to by fitting "fast GPs" which pair low-discrepancy design points from quasi-Monte Carlo to special kernel forms which yields nicely structured Gram matrices, e.g., circulant matrices. This article generalizes fast GPs to fast MTGPs which pair low-discrepancy design points for each task to special product kernel forms which yields nicely structured block Gram matrices, e.g., circulant block matrices. An algorithm is presented to efficiently store, invert, and compute the determinant of such Gram matrices with optionally different sampling nodes and different sample sizes for each task. Derivations for fast MTGP Bayesian cubature are also provided. A GPU-compatible, open-source Python implementation is made available in the FastGPs package (https://alegresor.github.io/fastgps/). We validate the efficiency of our algorithm and implementation compared to standard techniques on a range of problems with low numbers of tasks and large sample sizes.
Paper Structure (16 sections, 11 theorems, 21 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 11 theorems, 21 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Theorem 2.1

The NMLL loss eq:GP_NMLL_loss and GCV loss eq:GP_GCV_loss are minimized when the prior mean is respectively

Figures (6)

  • Figure 1: Independent identically distributed (IID) and low-discrepancy (LD) sequences. Each sequence has three independent randomizations shown in different colors, one for each task. The LD lattice has three independent uniform random shifts while the LD digital sequence has three independent uniform random digital-shifts. Different numbers of points are used for different tasks. Notice the more uniform coverage of LD sequences compared to IID points.
  • Figure 2: Structures and the inversion algorithm for an $L=3$ fast MTGP with digital sequences and a digitally-shift-invariant kernel.
  • Figure 3: Time per optimization step of our fast MTGPs for $L=2$ tasks.
  • Figure 4: Posterior mean estimates of a fast MTGP fit to the multifidelity Rosenbrock function.
  • Figure 5: Regression $L_2$ relative errors for the borehole problem.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Definition 2.1: Symmetric positive definite kernel
  • Theorem 2.1: GP optimal constant prior means
  • Definition 2.2: Shifted rank-$1$ lattice
  • Definition 2.3: Shift-invariant kernel
  • Definition 2.4: Digital-shift operator $\oplus$
  • Definition 2.5: Base-2 digitally-shifted digital sequences
  • Definition 2.6: Digitally-shift-invariant kernel
  • Theorem 2.2: Fast GP computations
  • Lemma 2.1
  • Theorem 2.3: Fast Bayesian cubature
  • ...and 10 more