Table of Contents
Fetching ...

CRADLE-VAE: Enhancing Single-Cell Gene Perturbation Modeling with Counterfactual Reasoning-based Artifact Disentanglement

Seungheun Baek, Soyon Park, Yan Ting Chok, Junhyun Lee, Jueon Park, Mogan Gim, Jaewoo Kang

TL;DR

Cradle-VAE addresses the challenge of modeling single-cell perturbation responses in the presence of technical artifacts by combining a variational autoencoder with counterfactual reasoning to disentangle perturbation effects, basal cell state, and artifact-related variation. The model introduces an auxiliary KL-based loss that aligns counterfactual artifact states with a reference, enabling robust artifact disentanglement and improved generation of artifact-free cellular responses. Across four Perturb-seq datasets, Cradle-VAE surpasses baselines on ATE-$\rho$, ATE-$R^2$, Jaccard, and especially QC Pass Rate, demonstrating stronger generative quality and better generalization to unseen perturbations. This artifact-aware, causal approach reduces reliance on aggressive QC filtering and provides artifact-resilient cellular response predictions with practical impact for drug discovery and precision therapeutics.

Abstract

Predicting cellular responses to various perturbations is a critical focus in drug discovery and personalized therapeutics, with deep learning models playing a significant role in this endeavor. Single-cell datasets contain technical artifacts that may hinder the predictability of such models, which poses quality control issues highly regarded in this area. To address this, we propose CRADLE-VAE, a causal generative framework tailored for single-cell gene perturbation modeling, enhanced with counterfactual reasoning-based artifact disentanglement. Throughout training, CRADLE-VAE models the underlying latent distribution of technical artifacts and perturbation effects present in single-cell datasets. It employs counterfactual reasoning to effectively disentangle such artifacts by modulating the latent basal spaces and learns robust features for generating cellular response data with improved quality. Experimental results demonstrate that this approach improves not only treatment effect estimation performance but also generative quality as well. The CRADLE-VAE codebase is publicly available at https://github.com/dmis-lab/CRADLE-VAE.

CRADLE-VAE: Enhancing Single-Cell Gene Perturbation Modeling with Counterfactual Reasoning-based Artifact Disentanglement

TL;DR

Cradle-VAE addresses the challenge of modeling single-cell perturbation responses in the presence of technical artifacts by combining a variational autoencoder with counterfactual reasoning to disentangle perturbation effects, basal cell state, and artifact-related variation. The model introduces an auxiliary KL-based loss that aligns counterfactual artifact states with a reference, enabling robust artifact disentanglement and improved generation of artifact-free cellular responses. Across four Perturb-seq datasets, Cradle-VAE surpasses baselines on ATE-, ATE-, Jaccard, and especially QC Pass Rate, demonstrating stronger generative quality and better generalization to unseen perturbations. This artifact-aware, causal approach reduces reliance on aggressive QC filtering and provides artifact-resilient cellular response predictions with practical impact for drug discovery and precision therapeutics.

Abstract

Predicting cellular responses to various perturbations is a critical focus in drug discovery and personalized therapeutics, with deep learning models playing a significant role in this endeavor. Single-cell datasets contain technical artifacts that may hinder the predictability of such models, which poses quality control issues highly regarded in this area. To address this, we propose CRADLE-VAE, a causal generative framework tailored for single-cell gene perturbation modeling, enhanced with counterfactual reasoning-based artifact disentanglement. Throughout training, CRADLE-VAE models the underlying latent distribution of technical artifacts and perturbation effects present in single-cell datasets. It employs counterfactual reasoning to effectively disentangle such artifacts by modulating the latent basal spaces and learns robust features for generating cellular response data with improved quality. Experimental results demonstrate that this approach improves not only treatment effect estimation performance but also generative quality as well. The CRADLE-VAE codebase is publicly available at https://github.com/dmis-lab/CRADLE-VAE.
Paper Structure (50 sections, 12 equations, 10 figures, 8 tables, 3 algorithms)

This paper contains 50 sections, 12 equations, 10 figures, 8 tables, 3 algorithms.

Figures (10)

  • Figure 1: Illustration of the training process and generative process of Cradle-VAE.
  • Figure 2: Graphical model of Cradle-VAE. $\bullet$ represents Hadamard product operation; $\otimes$ represents matrix multiplication operation; $\oplus$ represents vector concatenation.
  • Figure 3: Violin plots showing the data(blue) and model-generated(green) distribution of POLD3-perturbed cellular response for each QC sub-criteria. The red dotted line refers to the predefined QC threshold, with the green-colored region representing QC passed values and the red-colored region representing QC failed values.
  • Figure 4: t-SNE plots labelled by the presence of artifacts (left 1) and by perturbation types (right 3) for Cradle-VAE, conditional-VAE, and SAMS-VAE, respectively.
  • Figure 5: Violin plots of GAB2-perturbed cellular response for each QC sub-criteria.
  • ...and 5 more figures