Table of Contents
Fetching ...

Departures: Distributional Transport for Single-Cell Perturbation Prediction with Neural Schrödinger Bridges

Changxi Chi, Yufei Huang, Jun Xia, Jiangbin Zheng, Yunfan Liu, Zelin Zang, Stan Z. Li

TL;DR

Departures tackles unpaired single-cell perturbation prediction by directly aligning control and perturbed distributions through an approximate Schrödinger Bridge. It learns two stochastic bridges, one for continuous gene expression and one for discrete gene activation, guided by minibatch optimal transport to avoid bidirectional training. The method jointly trains continuous and discrete dynamics, enabling robust generation of perturbed cell profiles and capturing cellular heterogeneity. On Adamson and sci-Plex3 datasets, Departures achieves state-of-the-art distribution-level alignment and improves gene-activation predictions, signaling a scalable approach to perturbation forecasting in single-cell genomics.

Abstract

Predicting single-cell perturbation outcomes directly advances gene function analysis and facilitates drug candidate selection, making it a key driver of both basic and translational biomedical research. However, a major bottleneck in this task is the unpaired nature of single-cell data, as the same cell cannot be observed both before and after perturbation due to the destructive nature of sequencing. Although some neural generative transport models attempt to tackle unpaired single-cell perturbation data, they either lack explicit conditioning or depend on prior spaces for indirect distribution alignment, limiting precise perturbation modeling. In this work, we approximate Schrödinger Bridge (SB), which defines stochastic dynamic mappings recovering the entropy-regularized optimal transport (OT), to directly align the distributions of control and perturbed single-cell populations across different perturbation conditions. Unlike prior SB approximations that rely on bidirectional modeling to infer optimal source-target sample coupling, we leverage Minibatch-OT based pairing to avoid such bidirectional inference and the associated ill-posedness of defining the reverse process. This pairing directly guides bridge learning, yielding a scalable approximation to the SB. We approximate two SB models, one modeling discrete gene activation states and the other continuous expression distributions. Joint training enables accurate perturbation modeling and captures single-cell heterogeneity. Experiments on public genetic and drug perturbation datasets show that our model effectively captures heterogeneous single-cell responses and achieves state-of-the-art performance.

Departures: Distributional Transport for Single-Cell Perturbation Prediction with Neural Schrödinger Bridges

TL;DR

Departures tackles unpaired single-cell perturbation prediction by directly aligning control and perturbed distributions through an approximate Schrödinger Bridge. It learns two stochastic bridges, one for continuous gene expression and one for discrete gene activation, guided by minibatch optimal transport to avoid bidirectional training. The method jointly trains continuous and discrete dynamics, enabling robust generation of perturbed cell profiles and capturing cellular heterogeneity. On Adamson and sci-Plex3 datasets, Departures achieves state-of-the-art distribution-level alignment and improves gene-activation predictions, signaling a scalable approach to perturbation forecasting in single-cell genomics.

Abstract

Predicting single-cell perturbation outcomes directly advances gene function analysis and facilitates drug candidate selection, making it a key driver of both basic and translational biomedical research. However, a major bottleneck in this task is the unpaired nature of single-cell data, as the same cell cannot be observed both before and after perturbation due to the destructive nature of sequencing. Although some neural generative transport models attempt to tackle unpaired single-cell perturbation data, they either lack explicit conditioning or depend on prior spaces for indirect distribution alignment, limiting precise perturbation modeling. In this work, we approximate Schrödinger Bridge (SB), which defines stochastic dynamic mappings recovering the entropy-regularized optimal transport (OT), to directly align the distributions of control and perturbed single-cell populations across different perturbation conditions. Unlike prior SB approximations that rely on bidirectional modeling to infer optimal source-target sample coupling, we leverage Minibatch-OT based pairing to avoid such bidirectional inference and the associated ill-posedness of defining the reverse process. This pairing directly guides bridge learning, yielding a scalable approximation to the SB. We approximate two SB models, one modeling discrete gene activation states and the other continuous expression distributions. Joint training enables accurate perturbation modeling and captures single-cell heterogeneity. Experiments on public genetic and drug perturbation datasets show that our model effectively captures heterogeneous single-cell responses and achieves state-of-the-art performance.

Paper Structure

This paper contains 18 sections, 18 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Single-cell perturbation data are unpaired as RNA-seq is destructive.
  • Figure 2: Overview of Departures. Conditioned on cell type $ct$, the model learns the distributional transition of single-cell gene expression profiles from the control population $\pi_{0}^{ct}$ to perturbed population $\pi_{T}^{ct,P}$. The problem is formulated by decoupling into two components: (1) modeling the distributional shift of gene expression levels before and after perturbation (Continuous), and (2) modeling the distributional shift of gene activation status induced by perturbation (Discrete).
  • Figure 3: Perturbations induce natural, directional transitions from control to treated states. In contrast, reconstructing control states from perturbed ones is ill-posed, and training a backward model adds significant computational overhead.
  • Figure 4: Violin plots comparing predicted and actual expression levels of the top differentially expressed (DE) genes under the TMEM167A knockout condition, which was unseen during training, from the Adamson dataset.
  • Figure 5: UMAP visualization of predicted and actual gene activation states. The left panel shows results for the unseen HSD17B12 perturbation condition from the Adamson test set. The right panel presents predictions for a held-out perturbation (compound Sodium – dosage 0.001 – cell type MCF7) from the sciplex3 test set.
  • ...and 1 more figures