A Bayesian Gaussian Process-Based Latent Discriminative Generative Decoder (LDGD) Model for High-Dimensional Data

Navid Ziaei; Behzad Nazari; Uri T. Eden; Alik Widge; Ali Yousefi

A Bayesian Gaussian Process-Based Latent Discriminative Generative Decoder (LDGD) Model for High-Dimensional Data

Navid Ziaei, Behzad Nazari, Uri T. Eden, Alik Widge, Ali Yousefi

TL;DR

LDGD introduces a Bayesian, GP-based latent-variable framework that jointly models high-dimensional observations and their labels to learn a discriminative latent manifold. By employing two Gaussian process priors for continuous and categorical outputs and a doubly stochastic variational inference scheme with inducing points, LDGD achieves scalable, uncertainty-aware dimensionality reduction, accurate label prediction, and data generation. The model is validated on synthetic and real datasets (including Oil Flow, Iris, and MNIST), demonstrating competitive classification performance, interpretable latent-dimension selection via ARD, and credible high-dimensional reconstructions. Fast LDGD further enables real-time inference by mapping observations to latent parameters through a neural network, preserving predictive quality while enhancing scalability and applicability to larger datasets.

Abstract

Extracting meaningful information from high-dimensional data poses a formidable modeling challenge, particularly when the data is obscured by noise or represented through different modalities. This research proposes a novel non-parametric modeling approach, leveraging the Gaussian process (GP), to characterize high-dimensional data by mapping it to a latent low-dimensional manifold. This model, named the latent discriminative generative decoder (LDGD), employs both the data and associated labels in the manifold discovery process. We derive a Bayesian solution to infer the latent variables, allowing LDGD to effectively capture inherent stochasticity in the data. We demonstrate applications of LDGD on both synthetic and benchmark datasets. Not only does LDGD infer the manifold accurately, but its accuracy in predicting data points' labels surpasses state-of-the-art approaches. In the development of LDGD, we have incorporated inducing points to reduce the computational complexity of Gaussian processes for large datasets, enabling batch training for enhanced efficient processing and scalability. Additionally, we show that LDGD can robustly infer manifold and precisely predict labels for scenarios in that data size is limited, demonstrating its capability to efficiently characterize high-dimensional data with limited samples. These collective attributes highlight the importance of developing non-parametric modeling approaches to analyze high-dimensional data.

A Bayesian Gaussian Process-Based Latent Discriminative Generative Decoder (LDGD) Model for High-Dimensional Data

TL;DR

Abstract

Paper Structure (40 sections, 76 equations, 6 figures, 3 tables, 4 algorithms)

This paper contains 40 sections, 76 equations, 6 figures, 3 tables, 4 algorithms.

Introduction
Materials and Methods
Gaussian process Regression
Gaussian Process Latent Variable Model
Latent Discriminative Generative Decoder Model
Observed Data
Latent Variables
Gaussian process Priors
Observed Data Model
Marginal Likelihood Lower Bound
Inducing Points and Variational Distributions
Doubly Stochastic Variational GP and Variational Posterior Distribution.
Classification Expected Log-Likelihood ($\text{ELL}^{\text{cls}}$)
Regression Expected Log Likelihood ($ELL^{\text{reg}}$)
Training procedure
...and 25 more sections

Figures (6)

Figure 1: Graphical Models Depicting LDGD. (a) Exact inference, (b) Variational inference
Figure 2: Visualization of Dimensionality Reduction on Synthetic Data. (A) Displays the initial two dimensions of the moon-like dataset. (B-D) Illustrates the latent space heatmap representation across different synthetic data dimensions: 10 dimensions (B), 20 dimensions (C), and 40 dimensions (D). Red and blue points represent class 1 samples ($y^c=0$) and class 2 samples ($y^c=1$), respectively. Green crosses indicate classification-inducing points, while yellow crosses denote regression-inducing points, with five inducing points used for each. The green data points are more uniformly distributed over the space as it is constructing the $\mathbf{Y}^r$. The yellow ones are aligned on two sides of the decision boundary as they are more needed for a correct classification. The heatmap visualizes the model's uncertainty level (posterior variance). (E) Shows the training curve (ELBO loss) for synthetic data with ten dimensions, where the latent space is also set to 10 dimensions. (F-G) Depicts the ARD coefficients for the classification kernel (F) and the regression kernel (G). The trained coefficients highlight that the model selects two dimensions to represent a 10-dimensional space in a lower-dimensional setting for decoding labels and employs almost all dimensions to reconstruct data in the original space. (H) displays a scatter plot of the training points in the lower-dimensional space for the two most dominant dimensions, while (I) shows the scatter plot for test points where the labels are unknown.
Figure 3: Comparative analysis of latent space representation in Iris and Oil Flow dataset. (A) reveals the most dominant latent dimension for the Iris dataset, identified through ARD coefficients. (B) illustrates the scatter plot of the two dominant dimensions in the latent space for the Iris dataset, highlighting the data's intrinsic clustering. The vertical and horizontal error bars show the variance at each data point in latent space. (C) displays the most dominant latent dimension for the Oil dataset, as determined by ARD coefficients. (D) presents the scatter plot of the two dominant dimensions in the latent space for the Oil Flow dataset, showcasing its unique distribution.
Figure 4: Two-dimensional visualization of dataset projections using various dimensionality reduction techniques. The top row displays projections of the Oil Flow dataset, while the bottom row shows the Iris dataset. Techniques used include (A) PCA, (B) t-SNE, (C) GPLVM, (D) Bayesian GPLVM, (E) FGPLVM, (F) SGPLVM, (G) SLLGPLVM, and (H) LDGD. We observed SLLGPLVM and LDGD inference in low dimension reflecting the class label, ending to separate regions for different data classes. With LDGD, we can even have a one-dimensional representation of data. Thus, we might even get a better reduction rate using LDGD.
Figure 5: LDGD Data Generation Analysis. (A-D) These figures present scatter plots of data points in the 2D latent space for the Iris (A) and the Oil Flow dataset (C). For each dataset, a sample point near each cluster is randomly selected (marked with a cross) to illustrate the model's generative capabilities. The corresponding high-dimensional reconstructions of these starred points are shown for the Iris dataset (B) and the Oil Flow dataset (D), alongside the real values of other points. (E) displays nine randomly chosen test points in the latent space for the MNIST dataset and their reconstructed images through the model's generative path. The generated images properly reconstruct corresponding digits.
...and 1 more figures

A Bayesian Gaussian Process-Based Latent Discriminative Generative Decoder (LDGD) Model for High-Dimensional Data

TL;DR

Abstract

A Bayesian Gaussian Process-Based Latent Discriminative Generative Decoder (LDGD) Model for High-Dimensional Data

Authors

TL;DR

Abstract

Table of Contents

Figures (6)