Table of Contents
Fetching ...

Amortized Variational Inference for Deep Gaussian Processes

Qiuxian Meng, Yongyou Zhang

TL;DR

This work introduces amortized variational inference for Deep Gaussian Processes (AVDGP), replacing global variational parameters with an inference network that yields input-dependent inducing variables across GP layers. By employing input-conditioned priors and posteriors and leveraging three amortization strategies (AR1, AR2, AR2P), the method constructs a richer, non-degenerate prior and a flexible marginal posterior while maintaining scalable training via a tractable ELBO. Empirical results on toy and real-world regression benchmarks show AVDGP, particularly AR2P, achieving superior RMSE and CRPS with competitive or better calibration than strong baselines, and with favorable computational characteristics. The approach thus broadens the practical applicability of DGPs by balancing expressivity and efficiency, enabling principled uncertainty quantification in deeper, nonlinear function mappings.

Abstract

Gaussian processes (GPs) are Bayesian nonparametric models for function approximation with principled predictive uncertainty estimates. Deep Gaussian processes (DGPs) are multilayer generalizations of GPs that can represent complex marginal densities as well as complex mappings. As exact inference is either computationally prohibitive or analytically intractable in GPs and extensions thereof, some existing methods resort to variational inference (VI) techniques for tractable approximations. However, the expressivity of conventional approximate GP models critically relies on independent inducing variables that might not be informative enough for some problems. In this work we introduce amortized variational inference for DGPs, which learns an inference function that maps each observation to variational parameters. The resulting method enjoys a more expressive prior conditioned on fewer input dependent inducing variables and a flexible amortized marginal posterior that is able to model more complicated functions. We show with theoretical reasoning and experimental results that our method performs similarly or better than previous approaches at less computational cost.

Amortized Variational Inference for Deep Gaussian Processes

TL;DR

This work introduces amortized variational inference for Deep Gaussian Processes (AVDGP), replacing global variational parameters with an inference network that yields input-dependent inducing variables across GP layers. By employing input-conditioned priors and posteriors and leveraging three amortization strategies (AR1, AR2, AR2P), the method constructs a richer, non-degenerate prior and a flexible marginal posterior while maintaining scalable training via a tractable ELBO. Empirical results on toy and real-world regression benchmarks show AVDGP, particularly AR2P, achieving superior RMSE and CRPS with competitive or better calibration than strong baselines, and with favorable computational characteristics. The approach thus broadens the practical applicability of DGPs by balancing expressivity and efficiency, enabling principled uncertainty quantification in deeper, nonlinear function mappings.

Abstract

Gaussian processes (GPs) are Bayesian nonparametric models for function approximation with principled predictive uncertainty estimates. Deep Gaussian processes (DGPs) are multilayer generalizations of GPs that can represent complex marginal densities as well as complex mappings. As exact inference is either computationally prohibitive or analytically intractable in GPs and extensions thereof, some existing methods resort to variational inference (VI) techniques for tractable approximations. However, the expressivity of conventional approximate GP models critically relies on independent inducing variables that might not be informative enough for some problems. In this work we introduce amortized variational inference for DGPs, which learns an inference function that maps each observation to variational parameters. The resulting method enjoys a more expressive prior conditioned on fewer input dependent inducing variables and a flexible amortized marginal posterior that is able to model more complicated functions. We show with theoretical reasoning and experimental results that our method performs similarly or better than previous approaches at less computational cost.
Paper Structure (41 sections, 51 equations, 11 figures, 5 tables)

This paper contains 41 sections, 51 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Probabilistic graphical models of left: SVGP and right: IDSGP. SVGP uses independent inducing variables $\mathbf{u}$ for approximation. In IDSGP an inference function denoted by the green arrows maps each input $\boldsymbol{x}_n$ to the input dependent inducing variables $\mathbf{u}_n$.
  • Figure 2: Probabilistic graphical model of 2-layered DGP with AVI. Inference functions denoted by the green arrows map the outputs $F_n^{l-1}$ at the $(l-1)$-th layer to the variational parameters $\{\mathbf{Z}_n^l,\boldsymbol{\mu}_n^l,\boldsymbol{\Sigma}_n^l\}$ of the $l$-th layer.
  • Figure 3: Samples $f^{(1:L)}(\boldsymbol{x})$ successively drawn from 1-d top: conventional DGP priors ($M=128$) and bottom: amortized DGP priors ($M=4$), both with zero mean functions. Each column represents the number of layers, where 1 layer corresponds to shallow GPs. The function begins to concentrate at some values after a few layers, i.e., the prior fails to model functions of interest. This degeneration does not occur with amortized DGP priors.
  • Figure 4: Graphical model representations of left: 2-layered DS-DGP, with $S=2$ Monte Carlo samples, right: AR1 for 2-layered DGP with AVI, with $S=2$ Monte Carlo samples.
  • Figure 5: Graphical model representations of left: AR2 for 3-layered DGP with AVI, with $S=2$ Monte Carlo samples, right: AR2P for 3-layered DGP with AVI, with $S=2$ quadrature points.
  • ...and 6 more figures