CRADLE-VAE: Enhancing Single-Cell Gene Perturbation Modeling with Counterfactual Reasoning-based Artifact Disentanglement
Seungheun Baek, Soyon Park, Yan Ting Chok, Junhyun Lee, Jueon Park, Mogan Gim, Jaewoo Kang
TL;DR
Cradle-VAE addresses the challenge of modeling single-cell perturbation responses in the presence of technical artifacts by combining a variational autoencoder with counterfactual reasoning to disentangle perturbation effects, basal cell state, and artifact-related variation. The model introduces an auxiliary KL-based loss that aligns counterfactual artifact states with a reference, enabling robust artifact disentanglement and improved generation of artifact-free cellular responses. Across four Perturb-seq datasets, Cradle-VAE surpasses baselines on ATE-$\rho$, ATE-$R^2$, Jaccard, and especially QC Pass Rate, demonstrating stronger generative quality and better generalization to unseen perturbations. This artifact-aware, causal approach reduces reliance on aggressive QC filtering and provides artifact-resilient cellular response predictions with practical impact for drug discovery and precision therapeutics.
Abstract
Predicting cellular responses to various perturbations is a critical focus in drug discovery and personalized therapeutics, with deep learning models playing a significant role in this endeavor. Single-cell datasets contain technical artifacts that may hinder the predictability of such models, which poses quality control issues highly regarded in this area. To address this, we propose CRADLE-VAE, a causal generative framework tailored for single-cell gene perturbation modeling, enhanced with counterfactual reasoning-based artifact disentanglement. Throughout training, CRADLE-VAE models the underlying latent distribution of technical artifacts and perturbation effects present in single-cell datasets. It employs counterfactual reasoning to effectively disentangle such artifacts by modulating the latent basal spaces and learns robust features for generating cellular response data with improved quality. Experimental results demonstrate that this approach improves not only treatment effect estimation performance but also generative quality as well. The CRADLE-VAE codebase is publicly available at https://github.com/dmis-lab/CRADLE-VAE.
