Table of Contents
Fetching ...

Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder

Michael Bereket, Theofanis Karaletsos

TL;DR

SAMS-VAE introduces a sparse additive mechanism shift variational autoencoder that decomposes each cell's latent state into a basal component and sparse, additive perturbation offsets, enabling compositional and interpretable modeling of cellular perturbations. The model uses priors and a correlated inference scheme to sparsify perturbation effects and to better disentangle latent factors, with an ablated CPA-VAE for comparison. It is evaluated on perturb-seq scRNA-seq data using marginal likelihood via IWELBO and a posterior predictive check based on average treatment effects, demonstrating improved generalization and interpretability over baselines such as CPA-VAE and SVAE+. The work also proposes a framework linking model-based ATE to differential expression, and shows both quantitative and qualitative recoveries of known biological pathways, highlighting the method's potential for guiding biology-driven discovery and iterative experimentation.

Abstract

Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.

Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder

TL;DR

SAMS-VAE introduces a sparse additive mechanism shift variational autoencoder that decomposes each cell's latent state into a basal component and sparse, additive perturbation offsets, enabling compositional and interpretable modeling of cellular perturbations. The model uses priors and a correlated inference scheme to sparsify perturbation effects and to better disentangle latent factors, with an ablated CPA-VAE for comparison. It is evaluated on perturb-seq scRNA-seq data using marginal likelihood via IWELBO and a posterior predictive check based on average treatment effects, demonstrating improved generalization and interpretability over baselines such as CPA-VAE and SVAE+. The work also proposes a framework linking model-based ATE to differential expression, and shows both quantitative and qualitative recoveries of known biological pathways, highlighting the method's potential for guiding biology-driven discovery and iterative experimentation.

Abstract

Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.
Paper Structure (42 sections, 19 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 42 sections, 19 equations, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: SAMS-VAE generative process
  • Figure 2: Visualization of inferred latent perturbation masks and embedding means for the best performing checkpoint of each model in replogle-filtered. We visualize the latent variables for the 345 perturbations with pathway annotations from replogle and group by pathway. The SAMS-VAE and CPA-VAE models were trained with our proposed correlated inference strategy.
  • Figure 3: We visualize model-estimated treatment effects ($\text{ATE}_{\text{SAMS-VAE}}$) and data-estimated differential expression ($\text{DE}_{\text{Data}}$) for intervention-gene pairs in the Replogle experiment. We observe broad correlation (Pearson $r=0.765$): for example, perturbations of ribosomal subunits influence on all expression broadly with matching directionality, while other guides exhibit more targeted effects.
  • Figure 4: Results from norman-ood and norman-data-efficiency experiments. Within splits, test IWELBO values are plotted relative to the test IWELBO for SAMS-VAE trained with 0 combinations on that split (relative IWELBO) to enable comparison across splits. SAMS-VAE and CPA-VAE models are trained with the correlated inference schemes described in methods.
  • Figure 5: CPA-VAE generative process
  • ...and 5 more figures