Table of Contents
Fetching ...

The Deep Generative Decoder: MAP estimation of representations improves modeling of single-cell RNA data

Viktoria Schuster, Anders Krogh

TL;DR

The paper presents the Deep Generative Decoder (DGD), a encoder-free, MAP-based framework that learns latent representations and decoder parameters jointly by maximizing the posterior $P(X,Z,\phi,\theta)$. By modeling the latent space with a parameterized Gaussian Mixture Model and using priors such as a softball prior for component means, the DGD achieves flexible, interpretable latent structure with smaller dimensionality than typical VAEs. Demonstrations on Fashion-MNIST and a broad collection of single-cell RNA-seq datasets show that DGD yields meaningful sub-clustering, competitive reconstruction, and superior or comparable clustering performance with substantially fewer latent dimensions. While inference for new data points is slower than encoder-based methods, the approach offers simplicity, scalability, and easy extension to more complex latent distributions, making it well-suited for biological data analysis and potential multi-omics integration. The provided code bases enable replication and further development of encoder-free generative modeling in biomedical contexts.

Abstract

Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models such as variational autoencoders (VAEs) which use a variational approximation of the likelihood for inference. We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori (MAP) estimation. The DGD handles complex parameterized latent distributions naturally unlike VAEs which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell data sets. Here the DGD learns low-dimensional, meaningful and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable VAE.

The Deep Generative Decoder: MAP estimation of representations improves modeling of single-cell RNA data

TL;DR

The paper presents the Deep Generative Decoder (DGD), a encoder-free, MAP-based framework that learns latent representations and decoder parameters jointly by maximizing the posterior . By modeling the latent space with a parameterized Gaussian Mixture Model and using priors such as a softball prior for component means, the DGD achieves flexible, interpretable latent structure with smaller dimensionality than typical VAEs. Demonstrations on Fashion-MNIST and a broad collection of single-cell RNA-seq datasets show that DGD yields meaningful sub-clustering, competitive reconstruction, and superior or comparable clustering performance with substantially fewer latent dimensions. While inference for new data points is slower than encoder-based methods, the approach offers simplicity, scalability, and easy extension to more complex latent distributions, making it well-suited for biological data analysis and potential multi-omics integration. The provided code bases enable replication and further development of encoder-free generative modeling in biomedical contexts.

Abstract

Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models such as variational autoencoders (VAEs) which use a variational approximation of the likelihood for inference. We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori (MAP) estimation. The DGD handles complex parameterized latent distributions naturally unlike VAEs which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell data sets. Here the DGD learns low-dimensional, meaningful and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable VAE.

Paper Structure

This paper contains 41 sections, 6 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: The model.a) Graphical model and b) schematic of the deep generative decoder. The DGD consists of a decoder of any desired architecture with parameters $\theta$ mapping the latent representation $Z$ to the data space $X$. The representation is modelled by a probability distribution with parameters $\phi$. $N$ is the number of samples.
  • Figure 1: Fashion-MNIST hyperparameter search. The parallel coordinate plot from the corresponding wandb wandb project shows combinations of hyperparameters (all coordinates of the plot except the last) and the resulting models' reconstruction performance on the validation set. Each model is represented by a line colored by the validation reconstruction loss (BCE). A total of 132 different models were tested.
  • Figure 2: Fashion-MNIST latent representations and samples. The dimension of all latent spaces included in this figure is 20. The number of components of the GMM are denoted as "c". a) Generated images of samples drawn from each component of the DGD with 20 Gaussian components. The number of the component drawn from is depicted on the x axis. b)-c) Latent representations visualized as UMAP dimensions 1 and 2 colored by data classes. The UMAP was computed with 50 neighbors and a minimum distance of 0.7. b) Latent representation from the model with 20 GMM components. Numbers in the plot show the positions of the corresponding component means, from which samples were drawn in a). c) Latent representation from the supervised DGD with 10 components. d) Test image reconstruction from DGD, VAD and VAE. The top row shows the first 20 original test images. The remaining 3 rows show test image reconstructions. Names of the corresponding models are depicted on the left. The indication of one component refers to a standard Gaussian. e) Randomly sampled images from DGD, VAD and VAE. Corresponding models are indicated by plot titles.
  • Figure 2: Latent spaces for varying numbers of Gaussian components and latent dimensionalities trained on Fashion-MNIST.a) DGDs with a 2-dimensional latent space are trained with 1, 10 and 20 Gaussian components (GC). Latent points are colored by their type. This refers to whether they are learned representations, samples drawn from the GMM or component means. b) UMAP projections of 20-dimensional latent spaces with 1, 10 and 20 Gaussian components. The top row is colored by sample class, the bottom by latent point type as in a).
  • Figure 3: Latent spaces of scDGD. The latent spaces are shown colored by cell type (left) and type of latent point (right). Visualizations are achieved using UMAP with a spread of 5 and a minimum distance of 1. 'Representation' refers to learned representation of training data, 'samples' correspond to random samples drawn from the GMM and 'gmm means' represents the component means. The IDs of the component means are shown in black text on their corresponding coordinates. a) Latent space of scDGD with 9 Gaussian components. This model was trained for 700 epochs. b) Latent space of scDGD with 18 components, trained for 600 epochs. c)-e) Spatial relationships between GMM components of the 18-component scDGD. c) Heatmap of the GMM's component assignment to samples given as the percentage of each class with highest probability of a component. d) Graph visualization of the GMM component means with edge lengths correlated with Euclidean distances from a) and edge widths negatively correlated with Euclidean distances. Only the 95th percentile of distances were accepted as graph edges, resulting in a threshold of 0.14. Edge colors are grey or show components belonging to the same cell type (same colors as in \ref{['fig:pbmc1']}b), except blue, referring to all CD4 cells minus memory T cells).
  • ...and 7 more figures