Fast Multitask Gaussian Process Regression

Aleksei G. Sorokin; Pieterjan Robbe; Fred J. Hickernell

Fast Multitask Gaussian Process Regression

Aleksei G. Sorokin, Pieterjan Robbe, Fred J. Hickernell

Abstract

Gaussian process (GP) regression is a powerful probabilistic modeling technique with built-in uncertainty quantification. When one has access to multiple correlated simulations (tasks), it is common to fit a multitask GP (MTGP) surrogate which is capable of capturing both inter-task and intra-task correlations. However, with a total of $N$ evaluations across all tasks, fitting an MTGP is often infeasible due to the $\mathcal{O}(N^2)$ storage and $\mathcal{O}(N^3)$ computations required to store, solve a linear system in, and compute the determinant of the $N \times N$ Gram matrix of pairwise kernel evaluations. In the single-task setting, one may reduce the required storage to $\mathcal{O}(N)$ and computations to $\mathcal{O}(N \log N)$ by fitting "fast GPs" which pair low-discrepancy design points from quasi-Monte Carlo to special kernel forms which yields nicely structured Gram matrices, e.g., circulant matrices. This article generalizes fast GPs to fast MTGPs which pair low-discrepancy design points for each task to special product kernel forms which yields nicely structured block Gram matrices, e.g., circulant block matrices. An algorithm is presented to efficiently store, invert, and compute the determinant of such Gram matrices with optionally different sampling nodes and different sample sizes for each task. Derivations for fast MTGP Bayesian cubature are also provided. A GPU-compatible, open-source Python implementation is made available in the FastGPs package (https://alegresor.github.io/fastgps/). We validate the efficiency of our algorithm and implementation compared to standard techniques on a range of problems with low numbers of tasks and large sample sizes.

Fast Multitask Gaussian Process Regression

Abstract

evaluations across all tasks, fitting an MTGP is often infeasible due to the

storage and

computations required to store, solve a linear system in, and compute the determinant of the

Gram matrix of pairwise kernel evaluations. In the single-task setting, one may reduce the required storage to

and computations to

by fitting "fast GPs" which pair low-discrepancy design points from quasi-Monte Carlo to special kernel forms which yields nicely structured Gram matrices, e.g., circulant matrices. This article generalizes fast GPs to fast MTGPs which pair low-discrepancy design points for each task to special product kernel forms which yields nicely structured block Gram matrices, e.g., circulant block matrices. An algorithm is presented to efficiently store, invert, and compute the determinant of such Gram matrices with optionally different sampling nodes and different sample sizes for each task. Derivations for fast MTGP Bayesian cubature are also provided. A GPU-compatible, open-source Python implementation is made available in the FastGPs package (https://alegresor.github.io/fastgps/). We validate the efficiency of our algorithm and implementation compared to standard techniques on a range of problems with low numbers of tasks and large sample sizes.

Paper Structure (16 sections, 11 theorems, 21 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 11 theorems, 21 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Methods
Gaussian Processes
Fast Gaussian Processes
Multitask Gaussian Processes
Fast Multitask Gaussian Processes
Numerical Experiments
Scope of Configurations
Rosenbrock
Ackley
Borehole
Elliptic PDE
Cookies in the Oven
Benchmark Results
Conclusions and Future Work
...and 1 more sections

Key Result

Theorem 2.1

The NMLL loss eq:GP_NMLL_loss and GCV loss eq:GP_GCV_loss are minimized when the prior mean is respectively

Figures (6)

Figure 1: Independent identically distributed (IID) and low-discrepancy (LD) sequences. Each sequence has three independent randomizations shown in different colors, one for each task. The LD lattice has three independent uniform random shifts while the LD digital sequence has three independent uniform random digital-shifts. Different numbers of points are used for different tasks. Notice the more uniform coverage of LD sequences compared to IID points.
Figure 2: Structures and the inversion algorithm for an $L=3$ fast MTGP with digital sequences and a digitally-shift-invariant kernel.
Figure 3: Time per optimization step of our fast MTGPs for $L=2$ tasks.
Figure 4: Posterior mean estimates of a fast MTGP fit to the multifidelity Rosenbrock function.
Figure 5: Regression $L_2$ relative errors for the borehole problem.
...and 1 more figures

Theorems & Definitions (20)

Definition 2.1: Symmetric positive definite kernel
Theorem 2.1: GP optimal constant prior means
Definition 2.2: Shifted rank-$1$ lattice
Definition 2.3: Shift-invariant kernel
Definition 2.4: Digital-shift operator $\oplus$
Definition 2.5: Base-2 digitally-shifted digital sequences
Definition 2.6: Digitally-shift-invariant kernel
Theorem 2.2: Fast GP computations
Lemma 2.1
Theorem 2.3: Fast Bayesian cubature
...and 10 more

Fast Multitask Gaussian Process Regression

Abstract

Fast Multitask Gaussian Process Regression

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (20)