CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data

Ping-Han Hsieh; Ru-Xiu Hsiao; Katalin Ferenc; Anthony Mathelier; Rebekka Burkholz; Chien-Yu Chen; Geir Kjetil Sandve; Tatiana Belova; Marieke Lydia Kuijjer

CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data

Ping-Han Hsieh, Ru-Xiu Hsiao, Katalin Ferenc, Anthony Mathelier, Rebekka Burkholz, Chien-Yu Chen, Geir Kjetil Sandve, Tatiana Belova, Marieke Lydia Kuijjer

TL;DR

CAVACHON introduces a directed-acyclic-graph–guided hierarchical variational autoencoder to integrate multi-modal single-cell data while enforcing explicit conditional independencies between modalities. By learning modality-specific and shared latent representations in topological order, it enables isolation of common/distinct information, modality-wise differential analysis, and integrated multi-facet clustering, demonstrated on SNARE-Seq cortex and 10X PBMC datasets. The framework also supports chimeric profile generation to decompose differential expression by modality and provides a flexible, interpretable alternative to existing joint-embedding approaches. These capabilities facilitate hypothesis-driven analyses of regulatory interactions across modalities and can be extended to time-series or other multi-omics settings, with the implementation available in the authors’ repository.

Abstract

Paired single-cell sequencing technologies enable the simultaneous measurement of complementary modalities of molecular data at single-cell resolution. Along with the advances in these technologies, many methods based on variational autoencoders have been developed to integrate these data. However, these methods do not explicitly incorporate prior biological relationships between the data modalities, which could significantly enhance modeling and interpretation. We propose a novel probabilistic learning framework that explicitly incorporates conditional independence relationships between multi-modal data as a directed acyclic graph using a generalized hierarchical variational autoencoder. We demonstrate the versatility of our framework across various applications pertinent to single-cell multi-omics data integration. These include the isolation of common and distinct information from different modalities, modality-specific differential analysis, and integrated cell clustering. We anticipate that the proposed framework can facilitate the construction of highly flexible graphical models that can capture the complexities of biological hypotheses and unravel the connections between different biological data types, such as different modalities of paired single-cell multi-omics data. The implementation of the proposed framework can be found in the repository https://github.com/kuijjerlab/CAVACHON.

CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data

TL;DR

Abstract

Paper Structure (16 sections, 11 equations, 9 figures, 1 table)

This paper contains 16 sections, 11 equations, 9 figures, 1 table.

Introduction
Methods
Problem Definition
Probabilistic Generative Model
Contribution of Modalities to Differential Gene Expression
Datasets
Results
Isolating Common and Distinct Information of Modalities
Contribution of Modalities to Differential Gene Expression
Integrated Unsupervised Multi-facet Clustering
Discussion
Decomposition of ELBO
Multi-facet Clustering
Implementation Details
Training Strategy
...and 1 more sections

Figures (9)

Figure 1: Examples of relational graphs and graphical models. (a)-(d) Schematic diagrams of input graphs representing conditional independent relationships (left) between modalities of data and the graphical models created by our method (right). Note that, for simplicity, the batch information is omitted. The numbers under the names of the modalities in the rectangle boxes are the topological orders used for sequential training. The nodes in the graphical model are colored by the type of molecular assays. The dashed arrows denote the recognition model for posterior approximation, while the solid arrows denote the generative process. The figure shows examples with (a) joint learning, (b) states of chromatin accessibility that influence the expression of genes, (c) gene expression of later time points in a time series experiment that is dependent on earlier time points, and (d) states of chromatin accessibility and transcription factors that regulate gene expression. (e) The architecture of the created hierarchical variational autoencoder for the graphical model of (b).
Figure 2: Conditional latent representations for SNARE-Seq cerebral cortex of an adult mouse dataset. (a) The graphical model used to isolate the common and distinct representation between modalities. (b) t-SNE representation of the posterior mean of each modality's common and distinct latent representation. (c) Cell enrichment scores for each cell type (with 50-nearest neighbors). The cell enrichment score is the proportion of neighboring cells that share the same cell type, normalized by the number of cells in each cell type. The data points are coloured by the annotated cell types from the original SNARE-Seq study. For clarity, distinct cell types uniquely identified in each modality are annotated and highlighted with circles.
Figure 3: (a) The graphical model used to analyze the contribution of modalities to differential gene expression in 10k PBMCs dataset. (b) The schematic diagram illustrates the computation of the contribution score for each modality. The latent distributions from CD14 monocytes are represented in purple, while those from naive B cells are represented in green. The contribution score for each modality is calculated by quantifying the change in gene expression upon substituting the latent distribution of the targeted modality. (c) The contribution score of each modality to the differentially expressed genes based on the compositional Bayesian analysis of CD14 monocytes compared to naive B cells. Each point on the ternary plot denotes a unique differential expression profile, with its position indicating the contribution score of each modality to the changes in gene expression. The closer a point is to the vertex of the plot, the greater the contribution score of the corresponding modality. The size of the points reflects the fold-change of the expression between the two conditions, and the color whether the gene is up- or down-regulated in CD14 monocytes, compared to naive B cells. (d) The enrichment of TRRUST gene sets based on the differential expression driven by transcription factor modality and all modalities. TRRUST geneset contains curated transcriptional regulatory relationships between the transcription factors and their target genes. The enrichment of a specific transcription factor indicates that most of the target genes regulated by that transcription factor exhibit a distinct profile. (NES: normalized enrichment score)
Figure 4: Multi-facet clustering of the 10k PBMCs dataset. The top panel illustrates the posterior mean of the latent representations, colored by cell types, while the bottom panel illustrates the posterior mean of the latent representations, colored by multi-facet cluster assignments. From left to right, the panels represent the posterior mean of the latent representation of chromatin accessibility, transcription factor expression, and the expression of other genes, conditioned on the latent representations of chromatin accessibility and transcription factor expression, respectively.
Figure S1: Data flow with the our proposed framework. The framework takes data matrix, cell annotation, and feature annotation (or alternatively h5ad files) as inputs, constructs Modality objects (which is compatible with AnnData), and merges these into a MultiModality object (which is compatible with MuData). Finally, a Tensorflow Dataset used to load the data into the model is created from the MultiModality object.
...and 4 more figures

CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data

TL;DR

Abstract

CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data

Authors

TL;DR

Abstract

Table of Contents

Figures (9)