Table of Contents
Fetching ...

Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference

Jian Xu, Delu Zeng, John Paisley

TL;DR

This work tackles the biased posterior inference of inducing points in Deep Gaussian Processes (DGPs) by introducing Denoising Diffusion Variational Inference (DDVI). DDVI leverages a forward diffusion of inducing-point latents and a time-reversed SDE, aided by score matching via a neural network, to obtain posterior samples and an explicit variational lower bound for the marginal likelihood. The approach incorporates a bridge process trick to render KL terms tractable and provides a reparameterization-based SGD pipeline for scalable training and prediction. Empirical results across regression, image classification, and unsupervised data recovery demonstrate that DDVI yields more accurate posteriors, better predictive performance, and improved training stability compared to strong baselines such as DSVI, IPVI, and SGHMC.

Abstract

Deep Gaussian processes (DGPs) provide a robust paradigm for Bayesian deep learning. In DGPs, a set of sparse integration locations called inducing points are selected to approximate the posterior distribution of the model. This is done to reduce computational complexity and improve model efficiency. However, inferring the posterior distribution of inducing points is not straightforward. Traditional variational inference approaches to posterior approximation often lead to significant bias. To address this issue, we propose an alternative method called Denoising Diffusion Variational Inference (DDVI) that uses a denoising diffusion stochastic differential equation (SDE) to generate posterior samples of inducing variables. We rely on score matching methods for denoising diffusion model to approximate score functions with a neural network. Furthermore, by combining classical mathematical theory of SDEs with the minimization of KL divergence between the approximate and true processes, we propose a novel explicit variational lower bound for the marginal likelihood function of DGP. Through experiments on various datasets and comparisons with baseline methods, we empirically demonstrate the effectiveness of DDVI for posterior inference of inducing points for DGP models.

Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference

TL;DR

This work tackles the biased posterior inference of inducing points in Deep Gaussian Processes (DGPs) by introducing Denoising Diffusion Variational Inference (DDVI). DDVI leverages a forward diffusion of inducing-point latents and a time-reversed SDE, aided by score matching via a neural network, to obtain posterior samples and an explicit variational lower bound for the marginal likelihood. The approach incorporates a bridge process trick to render KL terms tractable and provides a reparameterization-based SGD pipeline for scalable training and prediction. Empirical results across regression, image classification, and unsupervised data recovery demonstrate that DDVI yields more accurate posteriors, better predictive performance, and improved training stability compared to strong baselines such as DSVI, IPVI, and SGHMC.

Abstract

Deep Gaussian processes (DGPs) provide a robust paradigm for Bayesian deep learning. In DGPs, a set of sparse integration locations called inducing points are selected to approximate the posterior distribution of the model. This is done to reduce computational complexity and improve model efficiency. However, inferring the posterior distribution of inducing points is not straightforward. Traditional variational inference approaches to posterior approximation often lead to significant bias. To address this issue, we propose an alternative method called Denoising Diffusion Variational Inference (DDVI) that uses a denoising diffusion stochastic differential equation (SDE) to generate posterior samples of inducing variables. We rely on score matching methods for denoising diffusion model to approximate score functions with a neural network. Furthermore, by combining classical mathematical theory of SDEs with the minimization of KL divergence between the approximate and true processes, we propose a novel explicit variational lower bound for the marginal likelihood function of DGP. Through experiments on various datasets and comparisons with baseline methods, we empirically demonstrate the effectiveness of DDVI for posterior inference of inducing points for DGP models.
Paper Structure (20 sections, 35 equations, 3 figures, 3 tables)

This paper contains 20 sections, 35 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Regression test RMSE results by our DDVI method (red), SGHMC (blue), IPVI(green) and DSVI (black) for DGPs on ten UCI benchmark datasets. The numbers 2, 3, 4, and 5 represent the layers of DGP methods. Lower is better. The mean is shown with error bars of one standard error. The dimensions of the data are displayed above each subgraph.
  • Figure 2: Regression test mean NLL results by our DDVI method (red), SGHMC (blue), IPVI (green) and DSVI (black) for DGPs on ten UCI benchmark datasets. The numbers 2, 3, 4, and 5 represent the layers of DGP methods. Lower is better. The mean is shown with error bars of one standard error. The dimensions of the data are displayed above each subgraph.
  • Figure 3: The Brendan faces reconstruction task with 75% missing pixels. The top row represents the ground truth data and the bottom row showcases the reconstructions from the 20-dimensional latent distribution.