Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

Jian Xu; Shian Du; Junmei Yang; Qianli Ma; Delu Zeng

Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng

TL;DR

The paper tackles the problem of loose variational bounds and weight collapse in high-dimensional GPLVMs by introducing VAIS-GPLVM, a variational approach that leverages Annealed Importance Sampling with time-inhomogeneous Unadjusted Langevin Diffusion to transform the posterior into a sequence of bridging distributions. It derives a tractable AIS-based ELBO via reparameterization, enabling stochastic gradient optimization, and demonstrates tighter bounds, higher log-likelihoods, and more robust convergence on toy and image datasets compared to MF and IW baselines. An extensive set of experiments shows improved reconstruction performance, uncertainty handling for unseen data with missing values, and significantly better ESS/weight-entropy metrics, indicating reduced weight degeneracy. The work offers a scalable, principled framework for variational learning in GPLVMs, with potential impact on dimensionality reduction and missing data recovery in complex, high-dimensional settings.

Abstract

Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simple data structures, as the generation of an effective proposal distribution can become quite challenging in high-dimensional spaces or with complex data sets. In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. By transforming the posterior into a sequence of intermediate distributions using annealing, we combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. We further propose an efficient algorithm by reparameterizing all variables in the evidence lower bound (ELBO). Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.

Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

TL;DR

Abstract

Paper Structure (29 sections, 44 equations, 11 figures, 6 tables)

This paper contains 29 sections, 44 equations, 11 figures, 6 tables.

Introduction
Background
GPLVM Variational Inference
Importance-weighted Variational Inference
Variational AIS Scheme in GPLVMs
Variational Inference via AIS
Time-inhomogeneous Unadjusted Langevin Diffusion
Reparameterization Trick and Stochastic Gradient Descent
Related Work
IWVI
Differentiable AIS
Diffusion models
Experiments
Baseline Methods
Dimensionality Reduction
...and 14 more sections

Figures (11)

Figure 1: The graphical models of (a) IW and (b) our method. We leverages an annealing procedure to transform the posterior distribution into a sequence of intermediate distributions.
Figure 2: We lowered the data dimensionality using our proposed method in the multi-phase oilflow dataset and visualized a two-dimensional slice of the latent space that corresponds to the most dominant latent dimensions. The inverse lengthscales learnt with SE-ARD kernel for each dimension are depicted in the middle plot, and the negative ELBO learning curves are shown in the right plot. We set the same learning rate and compared the learning curves of two state-of-the-art models, MF and Importance Weighted VI within 3000 iterations for GPLVMs.
Figure 3: In the Brendan faces reconstruction task with 75% missing pixels, the top row represents the ground truth data and the bottom row showcases the reconstructions from the 20-dimensional latent distribution.
Figure 4: The negative ELBO convergence curves of the three methods on the Frey Faces dataset. It is noted that as the number of iterations increase, the y-axis scale gradually increases from left to right.
Figure 5: For MNIST with 75% missing pixels, we used digits 1 and 7. The bottom row shows ground truth, while the top row shows reconstructions from the 5D latent space. The 2D plot on the right visualizes the dimensions with the smallest lengthscales.
...and 6 more figures

Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

TL;DR

Abstract

Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (11)