TRIDENT: A Trimodal Cascade Generative Framework for Drug and RNA-Conditioned Cellular Morphology Synthesis
Rui Peng, Ziru Liu, Lingyuan Ye, Yuxing Lu, Boxin Shi, Jinzhuo Wang
TL;DR
The paper tackles the lack of a unified Perturbation $\rightarrow$ RNA $\rightarrow$ Morphology pathway in virtual cell modeling. It introduces TRIDENT, a cascade framework that encodes pre-perturbation gene expression and drug information into a latent $z$ via a VAE, then uses a Latent Diffusion Model with cross-attention to generate high-fidelity morphologies conditioned on $z$. A new MorphoGene dataset pairs Cell Painting images with L1000 profiles for 98 compounds, enabling end-to-end training and robust evaluation, including ID and OOD generalization and a docetaxel case study with strong RNA–phenotype alignment (Pearson $r$=0.957). TRIDENT achieves up to 7× improvements over baselines in fidelity and demonstrates MOA-aware, biology-consistent morphologies, bringing AI Virtual Cell modeling closer to predictive capability while highlighting data limitations and future directions for cross-cell-type generalization.
Abstract
Accurately modeling the relationship between perturbations, transcriptional responses, and phenotypic changes is essential for building an AI Virtual Cell (AIVC). However, existing methods typically constrained to modeling direct associations, such as Perturbation $\rightarrow$ RNA or Perturbation $\rightarrow$ Morphology, overlook the crucial causal link from RNA to morphology. To bridge this gap, we propose TRIDENT, a cascade generative framework that synthesizes realistic cellular morphology by conditioning on both the perturbation and the corresponding gene expression profile. To train and evaluate this task, we construct MorphoGene, a new dataset pairing L1000 gene expression with Cell Painting images for 98 compounds. TRIDENT significantly outperforms state-of-the-art approaches, achieving up to 7-fold improvement with strong generalization to unseen compounds. In a case study on docetaxel, we validate that RNA-guided synthesis accurately produces the corresponding phenotype. An ablation study further confirms that this RNA conditioning is essential for the model's high fidelity. By explicitly modeling transcriptome-phenome mapping, TRIDENT provides a powerful in silico tool and moves us closer to a predictive virtual cell.
