Learning Identifiable Factorized Causal Representations of Cellular Responses
Haiyi Mao, Romain Lopez, Kai Liu, Jan-Christian Hütter, David Richmond, Panayiotis V. Benos, Lin Qiu
TL;DR
This work addresses how single-cell perturbation responses depend on biological context by proposing Factorized Causal Representation (FCR), an identifiable deep generative model that decomposes cellular states into covariate-specific ($\mathbf{z}_x$), interaction-specific ($\mathbf{z}_{tx}$), and treatment-specific ($\mathbf{z}_t$) latent blocks. Building on nonlinear ICA theory, it provides identifiability guarantees for $\mathbf{z}_{tx}$ and block-wise identifiability for $\mathbf{z}_x$ and $\mathbf{z}_t$ under rich experimental variability. The methodology combines variational inference with causal structure regularization and permutation discriminators to enforce the desired independence properties, and it validates on four real single-cell datasets, showing improved clustering and conditional independence testing results, as well as competitive single-cell response predictions. This framework advances precision medicine by enabling interpretable, interaction-aware representations of cellular responses across diverse biological contexts.
Abstract
The study of cells and their responses to genetic or chemical perturbations promises to accelerate the discovery of therapeutic targets. However, designing adequate and insightful models for such data is difficult because the response of a cell to perturbations essentially depends on its biological context (e.g., genetic background or cell type). For example, while discovering therapeutic targets, one may want to enrich for drugs that specifically target a certain cell type. This challenge emphasizes the need for methods that explicitly take into account potential interactions between drugs and contexts. Towards this goal, we propose a novel Factorized Causal Representation (FCR) learning method that reveals causal structure in single-cell perturbation data from several cell lines. Based on the framework of identifiable deep generative models, FCR learns multiple cellular representations that are disentangled, comprised of covariate-specific ($\mathbf{z}_x$), treatment-specific ($\mathbf{z}_{t}$), and interaction-specific ($\mathbf{z}_{tx}$) blocks. Based on recent advances in non-linear ICA theory, we prove the component-wise identifiability of $\mathbf{z}_{tx}$ and block-wise identifiability of $\mathbf{z}_t$ and $\mathbf{z}_x$. Then, we present our implementation of FCR, and empirically demonstrate that it outperforms state-of-the-art baselines in various tasks across four single-cell datasets.
