Amortized Variational Inference for Deep Gaussian Processes
Qiuxian Meng, Yongyou Zhang
TL;DR
This work introduces amortized variational inference for Deep Gaussian Processes (AVDGP), replacing global variational parameters with an inference network that yields input-dependent inducing variables across GP layers. By employing input-conditioned priors and posteriors and leveraging three amortization strategies (AR1, AR2, AR2P), the method constructs a richer, non-degenerate prior and a flexible marginal posterior while maintaining scalable training via a tractable ELBO. Empirical results on toy and real-world regression benchmarks show AVDGP, particularly AR2P, achieving superior RMSE and CRPS with competitive or better calibration than strong baselines, and with favorable computational characteristics. The approach thus broadens the practical applicability of DGPs by balancing expressivity and efficiency, enabling principled uncertainty quantification in deeper, nonlinear function mappings.
Abstract
Gaussian processes (GPs) are Bayesian nonparametric models for function approximation with principled predictive uncertainty estimates. Deep Gaussian processes (DGPs) are multilayer generalizations of GPs that can represent complex marginal densities as well as complex mappings. As exact inference is either computationally prohibitive or analytically intractable in GPs and extensions thereof, some existing methods resort to variational inference (VI) techniques for tractable approximations. However, the expressivity of conventional approximate GP models critically relies on independent inducing variables that might not be informative enough for some problems. In this work we introduce amortized variational inference for DGPs, which learns an inference function that maps each observation to variational parameters. The resulting method enjoys a more expressive prior conditioned on fewer input dependent inducing variables and a flexible amortized marginal posterior that is able to model more complicated functions. We show with theoretical reasoning and experimental results that our method performs similarly or better than previous approaches at less computational cost.
