A Bayesian Gaussian Process-Based Latent Discriminative Generative Decoder (LDGD) Model for High-Dimensional Data
Navid Ziaei, Behzad Nazari, Uri T. Eden, Alik Widge, Ali Yousefi
TL;DR
LDGD introduces a Bayesian, GP-based latent-variable framework that jointly models high-dimensional observations and their labels to learn a discriminative latent manifold. By employing two Gaussian process priors for continuous and categorical outputs and a doubly stochastic variational inference scheme with inducing points, LDGD achieves scalable, uncertainty-aware dimensionality reduction, accurate label prediction, and data generation. The model is validated on synthetic and real datasets (including Oil Flow, Iris, and MNIST), demonstrating competitive classification performance, interpretable latent-dimension selection via ARD, and credible high-dimensional reconstructions. Fast LDGD further enables real-time inference by mapping observations to latent parameters through a neural network, preserving predictive quality while enhancing scalability and applicability to larger datasets.
Abstract
Extracting meaningful information from high-dimensional data poses a formidable modeling challenge, particularly when the data is obscured by noise or represented through different modalities. This research proposes a novel non-parametric modeling approach, leveraging the Gaussian process (GP), to characterize high-dimensional data by mapping it to a latent low-dimensional manifold. This model, named the latent discriminative generative decoder (LDGD), employs both the data and associated labels in the manifold discovery process. We derive a Bayesian solution to infer the latent variables, allowing LDGD to effectively capture inherent stochasticity in the data. We demonstrate applications of LDGD on both synthetic and benchmark datasets. Not only does LDGD infer the manifold accurately, but its accuracy in predicting data points' labels surpasses state-of-the-art approaches. In the development of LDGD, we have incorporated inducing points to reduce the computational complexity of Gaussian processes for large datasets, enabling batch training for enhanced efficient processing and scalability. Additionally, we show that LDGD can robustly infer manifold and precisely predict labels for scenarios in that data size is limited, demonstrating its capability to efficiently characterize high-dimensional data with limited samples. These collective attributes highlight the importance of developing non-parametric modeling approaches to analyze high-dimensional data.
