Table of Contents
Fetching ...

TRIDENT: A Trimodal Cascade Generative Framework for Drug and RNA-Conditioned Cellular Morphology Synthesis

Rui Peng, Ziru Liu, Lingyuan Ye, Yuxing Lu, Boxin Shi, Jinzhuo Wang

TL;DR

The paper tackles the lack of a unified Perturbation $\rightarrow$ RNA $\rightarrow$ Morphology pathway in virtual cell modeling. It introduces TRIDENT, a cascade framework that encodes pre-perturbation gene expression and drug information into a latent $z$ via a VAE, then uses a Latent Diffusion Model with cross-attention to generate high-fidelity morphologies conditioned on $z$. A new MorphoGene dataset pairs Cell Painting images with L1000 profiles for 98 compounds, enabling end-to-end training and robust evaluation, including ID and OOD generalization and a docetaxel case study with strong RNA–phenotype alignment (Pearson $r$=0.957). TRIDENT achieves up to 7× improvements over baselines in fidelity and demonstrates MOA-aware, biology-consistent morphologies, bringing AI Virtual Cell modeling closer to predictive capability while highlighting data limitations and future directions for cross-cell-type generalization.

Abstract

Accurately modeling the relationship between perturbations, transcriptional responses, and phenotypic changes is essential for building an AI Virtual Cell (AIVC). However, existing methods typically constrained to modeling direct associations, such as Perturbation $\rightarrow$ RNA or Perturbation $\rightarrow$ Morphology, overlook the crucial causal link from RNA to morphology. To bridge this gap, we propose TRIDENT, a cascade generative framework that synthesizes realistic cellular morphology by conditioning on both the perturbation and the corresponding gene expression profile. To train and evaluate this task, we construct MorphoGene, a new dataset pairing L1000 gene expression with Cell Painting images for 98 compounds. TRIDENT significantly outperforms state-of-the-art approaches, achieving up to 7-fold improvement with strong generalization to unseen compounds. In a case study on docetaxel, we validate that RNA-guided synthesis accurately produces the corresponding phenotype. An ablation study further confirms that this RNA conditioning is essential for the model's high fidelity. By explicitly modeling transcriptome-phenome mapping, TRIDENT provides a powerful in silico tool and moves us closer to a predictive virtual cell.

TRIDENT: A Trimodal Cascade Generative Framework for Drug and RNA-Conditioned Cellular Morphology Synthesis

TL;DR

The paper tackles the lack of a unified Perturbation RNA Morphology pathway in virtual cell modeling. It introduces TRIDENT, a cascade framework that encodes pre-perturbation gene expression and drug information into a latent via a VAE, then uses a Latent Diffusion Model with cross-attention to generate high-fidelity morphologies conditioned on . A new MorphoGene dataset pairs Cell Painting images with L1000 profiles for 98 compounds, enabling end-to-end training and robust evaluation, including ID and OOD generalization and a docetaxel case study with strong RNA–phenotype alignment (Pearson =0.957). TRIDENT achieves up to 7× improvements over baselines in fidelity and demonstrates MOA-aware, biology-consistent morphologies, bringing AI Virtual Cell modeling closer to predictive capability while highlighting data limitations and future directions for cross-cell-type generalization.

Abstract

Accurately modeling the relationship between perturbations, transcriptional responses, and phenotypic changes is essential for building an AI Virtual Cell (AIVC). However, existing methods typically constrained to modeling direct associations, such as Perturbation RNA or Perturbation Morphology, overlook the crucial causal link from RNA to morphology. To bridge this gap, we propose TRIDENT, a cascade generative framework that synthesizes realistic cellular morphology by conditioning on both the perturbation and the corresponding gene expression profile. To train and evaluate this task, we construct MorphoGene, a new dataset pairing L1000 gene expression with Cell Painting images for 98 compounds. TRIDENT significantly outperforms state-of-the-art approaches, achieving up to 7-fold improvement with strong generalization to unseen compounds. In a case study on docetaxel, we validate that RNA-guided synthesis accurately produces the corresponding phenotype. An ablation study further confirms that this RNA conditioning is essential for the model's high fidelity. By explicitly modeling transcriptome-phenome mapping, TRIDENT provides a powerful in silico tool and moves us closer to a predictive virtual cell.

Paper Structure

This paper contains 17 sections, 11 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: A comparison of cellular response modeling tasks. (Left) Predicting RNA from perturbation. (Middle) Predicting morphology from perturbation. (Right) Our model, TRIDENT, which integrates both perturbation and RNA to predict morphology, explicitly learning the RNA $\rightarrow$ Morphology relationship.
  • Figure 2: Overview of the TRIDENT framework. (a) A VAE maps the high-resolution morphology images from pixel space into a compressed latent representation. (b) The Morphology Generation Module. A denoising transformer learns to reverse a forward noising process, using cross-attention to integrate a guiding condition vector that combines RNA-drug latent and time information. (c) The Transcription-Drug Condition Module. A VAE-based module encodes pre-perturbation gene expression and drug information into a latent vector, which is used to guide image generation. (d) Symbol definitions.
  • Figure 3: Visual comparison of generated cellular morphologies under six drug perturbations. Ground-truth images (Row 1) are compared to outputs from TRIDENT (Row 2), MorphoDiff (Row 3), and Stable Diffusion (Row 4). See supplementary material for more results.
  • Figure 4: TRIDENT captures biologically interpretable signatures in embedding and feature space. (a) ViT embeddings of generated images form distinct, MOA-specific clusters in LDA space, with representative images shown for each cluster. (b) UMAP visualization confirms high distributional alignment between generated (green) and real (orange) images, which are both separate from the control (blue) population. (c) Quantitative CellProfiler analysis shows that AreaOccupied feature distributions of generated and real images are highly similar and distinct from control across all cellular compartments.
  • Figure 5: TRIDENT learns the association between transcriptome and morphology. (a) Heatmaps comparing predicted (left) versus ground-truth (right) gene expression log fold changes for 44 compounds. (b) Functional enrichment analysis of model-predicted genes for docetaxel identifies pathways consistent with its known MOA. (c) Visual comparison of TRIDENT-predicted morphology for docetaxel (bottom) versus control (top), correctly capturing the phenotype of reduced cell density.
  • ...and 3 more figures