Table of Contents
Fetching ...

GP+: A Python Library for Kernel-based learning via Gaussian Processes

Amin Yousefpour, Zahra Zanjani Foumani, Mehdi Shishehbor, Carlos Mora, Ramin Bostanabad

TL;DR

This paper makes methodological contributions that enable probabilistic data fusion and inverse parameter estimation, and equip GPs with parsimonious parametric mean functions which span mixed feature spaces that have both categorical and quantitative variables.

Abstract

In this paper we introduce GP+, an open-source library for kernel-based learning via Gaussian processes (GPs) which are powerful statistical models that are completely characterized by their parametric covariance and mean functions. GP+ is built on PyTorch and provides a user-friendly and object-oriented tool for probabilistic learning and inference. As we demonstrate with a host of examples, GP+ has a few unique advantages over other GP modeling libraries. We achieve these advantages primarily by integrating nonlinear manifold learning techniques with GPs' covariance and mean functions. As part of introducing GP+, in this paper we also make methodological contributions that (1) enable probabilistic data fusion and inverse parameter estimation, and (2) equip GPs with parsimonious parametric mean functions which span mixed feature spaces that have both categorical and quantitative variables. We demonstrate the impact of these contributions in the context of Bayesian optimization, multi-fidelity modeling, sensitivity analysis, and calibration of computer models.

GP+: A Python Library for Kernel-based learning via Gaussian Processes

TL;DR

This paper makes methodological contributions that enable probabilistic data fusion and inverse parameter estimation, and equip GPs with parsimonious parametric mean functions which span mixed feature spaces that have both categorical and quantitative variables.

Abstract

In this paper we introduce GP+, an open-source library for kernel-based learning via Gaussian processes (GPs) which are powerful statistical models that are completely characterized by their parametric covariance and mean functions. GP+ is built on PyTorch and provides a user-friendly and object-oriented tool for probabilistic learning and inference. As we demonstrate with a host of examples, GP+ has a few unique advantages over other GP modeling libraries. We achieve these advantages primarily by integrating nonlinear manifold learning techniques with GPs' covariance and mean functions. As part of introducing GP+, in this paper we also make methodological contributions that (1) enable probabilistic data fusion and inverse parameter estimation, and (2) equip GPs with parsimonious parametric mean functions which span mixed feature spaces that have both categorical and quantitative variables. We demonstrate the impact of these contributions in the context of Bayesian optimization, multi-fidelity modeling, sensitivity analysis, and calibration of computer models.
Paper Structure (31 sections, 54 equations, 21 figures, 13 tables)

This paper contains 31 sections, 54 equations, 21 figures, 13 tables.

Figures (21)

  • Figure 1: Schematic illustration of continuation-based optimization: The profile of $L_{MAP}$ in \ref{['eq: map-gp']} is smoothed with a larger $\epsilon$ (or, equivalently, a larger nugget).
  • Figure 2: Emulation via GP+ in mixed input spaces: We first endow the categorical variables $\boldsymbol{t}$ with some quantitative prior representations which are then mapped to a low-dimensional embedding with a parametric function. The embedded variables $\boldsymbol{h}$ are then concatenated with $\boldsymbol{x}$ and fed into the mean and covariance functions. All the model parameters are jointly learnt via MAP.
  • Figure 3: Graphical representation of multi-fidelity modeling techniques: The method of KOH RN705(a) and its extension to hierarchical techniques (b) impose specific relations between the data sources. However, GP+(c) does not impose any prior relation among the data sources and its structure resembles an undirected graph.
  • Figure 4: Probabilistic multi-fidelity modeling via GP+: Categorical inputs $\boldsymbol{t}$ are mapped to latent points in the $h-$space while the source indicator variable $s$ is mapped to a conditional distribution in $z-$space. Both mappings are achieved via deterministic and differentiable functions. Due to the probabilistic nature of $\boldsymbol{z}$, multiple forward passes are requiblack to obtain the final outputs of the model.
  • Figure 5: Multi-fidelity modeling via mixed basis functions: Two generic options are defined in GP+ for building the mixed bases: $(1)$ pblacketermined bases where multiple bases like polynomial, $sin(\cdot)$ and $cos(\cdot)$ can be defined for each data source, $(2)$ FFNNs with user-defined architectures. All the parameters of the mean functions have a normal prior and are jointly learned through MAP.
  • ...and 16 more figures