Table of Contents
Fetching ...

Generalized and Scalable Deep Gaussian Process Emulation

Deyu Ming, Daniel Williamson

Abstract

Gaussian process (GP) emulators have become essential tools for approximating complex simulators, significantly reducing computational demands in optimization, sensitivity analysis, and model calibration. While traditional GP emulators effectively model continuous and Gaussian-distributed simulator outputs with homogeneous variability, they typically struggle with discrete, heteroskedastic Gaussian, or non-Gaussian data, limiting their applicability to increasingly common stochastic simulators. In this work, we introduce a scalable Generalized Deep Gaussian Process (GDGP) emulation framework designed to accommodate simulators with heteroskedastic Gaussian outputs and a wide range of non-Gaussian response distributions, including Poisson, negative binomial, and categorical distributions. The GDGP framework leverages the expressiveness of DGPs and extends them to latent GP structures, enabling it to capture the complex, non-stationary behavior inherent in many simulators while also modeling non-Gaussian simulator outputs. We make GDGP scalable by incorporating the Vecchia approximation for settings with a large number of input locations, while also developing efficient inference procedures for handling large numbers of replicates. In particular, we present methodological developments that further enhance the computation of the approach for heteroskedastic Gaussian responses. We demonstrate through a series of synthetic and empirical examples that these extensions deliver the practical application of GDGP emulators and a unified methodology capable of addressing diverse modeling challenges. The proposed GDGP framework is implemented in the open-source R package dgpsi.

Generalized and Scalable Deep Gaussian Process Emulation

Abstract

Gaussian process (GP) emulators have become essential tools for approximating complex simulators, significantly reducing computational demands in optimization, sensitivity analysis, and model calibration. While traditional GP emulators effectively model continuous and Gaussian-distributed simulator outputs with homogeneous variability, they typically struggle with discrete, heteroskedastic Gaussian, or non-Gaussian data, limiting their applicability to increasingly common stochastic simulators. In this work, we introduce a scalable Generalized Deep Gaussian Process (GDGP) emulation framework designed to accommodate simulators with heteroskedastic Gaussian outputs and a wide range of non-Gaussian response distributions, including Poisson, negative binomial, and categorical distributions. The GDGP framework leverages the expressiveness of DGPs and extends them to latent GP structures, enabling it to capture the complex, non-stationary behavior inherent in many simulators while also modeling non-Gaussian simulator outputs. We make GDGP scalable by incorporating the Vecchia approximation for settings with a large number of input locations, while also developing efficient inference procedures for handling large numbers of replicates. In particular, we present methodological developments that further enhance the computation of the approach for heteroskedastic Gaussian responses. We demonstrate through a series of synthetic and empirical examples that these extensions deliver the practical application of GDGP emulators and a unified methodology capable of addressing diverse modeling challenges. The proposed GDGP framework is implemented in the open-source R package dgpsi.

Paper Structure

This paper contains 28 sections, 5 theorems, 83 equations, 6 figures, 3 tables, 4 algorithms.

Key Result

Proposition 4.1

Under the Vecchia approximation, the mean $\mu^{(q)}_{1\rightarrow L,k}(\mathbf{x}_0)$ and variance ${\sigma^2}^{(q)}_{1\rightarrow L,k}(\mathbf{x}_0)$ of the univariate normal distribution defined by $\widehat{p}(f^{(q)}_0|\mathbf{x}_0;\mathbf{f}_k,\{\mathbf{w}^{(p)}_{l}\}_k,\mathbf{x})$ can be obt for $l=2,\dots,L$ and $q=1,\dots,P_l$, with ${\mu}^{(q)}_{1\rightarrow 1,k}(\mathbf{x}_0)$ and ${{\

Figures (6)

  • Figure 1: The hierarchy of DGP model that produces $\mathbf{F}$ given the input $\mathbf{x}$.
  • Figure 2: The hierarchy of GDGP. The $\mathcal{L}$ node is the likelihood layer that represents the distributional relation between $\mathbf{F}$ and $\mathbf{Y}$.
  • Figure 3: NRMSEs (lower is better) and NCRPSs (lower is better) for hetGP, bhetGP, and GDGP, trained on $100$ unique input locations with $R\in\{20,40,60,80,100\}$ replicates per location, for emulating the mean function $\mu(x)$ and log-variance function $\log \sigma^2(x)$ of the synthetic simulator described in Section \ref{['sec:1d-het']}. Results are evaluated using $1{,}000$ validation points and summarized over $100$ independent training trials.
  • Figure 4: Scores (higher is better) for hetGP, bhetGP, and GDGP, trained on $n\in\{100,200,300,400,500\}$ unique input locations with a random number of replicates between $1$ and $100$ at each location, for emulating the cumulative attack proportion of the SIR simulator in Section \ref{['sec:sir']} at a time horizon of $100$. Scores are evaluated on a test set of size $N=60{,}000$, consisting of $2{,}000$ space-filling input locations with $30$ replicates at each location, and summarized over $20$ independent training trials for each $n$.
  • Figure 5: Scores (a) and computation times (b), averaged over $20$ repeated training trials, for the full (i.e., non-Vecchia-approximated) GDGP and Vecchia-approximated GDGPs with different training conditioning set sizes in $\{5, 15, 25, 50, 75\}$ and prediction conditioning set size fixed at $200$. Models are trained on datasets with $n=100,200,\dots,1000$ unique input locations, where the outputs at each location are generated a random number of times (between $1$ and $100$) by the SIR simulator (in Section \ref{['sec:sir']}). Scores are evaluated on a test set of size $N=60{,}000$, consisting of $2{,}000$ space-filling input locations with $30$ replicates at each location.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Proposition 4.1
  • Proof 1
  • Proposition 4.2
  • Proof 2
  • Proposition 4.3: Small $N$, large $S_i$
  • Proposition 4.4: Large $N$, $S_i=1$
  • Proposition 4.5: Large $N$, $S_i>1$
  • Proof 3