Table of Contents
Fetching ...

Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

Han Zhang, Guo-Hua Yuan, Chaohao Yuan, Tingyang Xu, Tian Bian, Hong Cheng, Wenbing Huang, Deli Zhao, Yu Rong

Abstract

Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. Here, we introduce Lingshu-Cell, a masked discrete diffusion model that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Across diverse tissues and species, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns and cell-subtype proportions, demonstrating its ability to capture complex cellular heterogeneity. Moreover, by jointly embedding cell type or donor identity with perturbation, Lingshu-Cell can predict whole-transcriptome expression changes for novel combinations of identity and perturbation. It achieves leading performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine-induced responses in human PBMCs. Together, these results establish Lingshu-Cell as a flexible cellular world model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.

Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

Abstract

Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. Here, we introduce Lingshu-Cell, a masked discrete diffusion model that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Across diverse tissues and species, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns and cell-subtype proportions, demonstrating its ability to capture complex cellular heterogeneity. Moreover, by jointly embedding cell type or donor identity with perturbation, Lingshu-Cell can predict whole-transcriptome expression changes for novel combinations of identity and perturbation. It achieves leading performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine-induced responses in human PBMCs. Together, these results establish Lingshu-Cell as a flexible cellular world model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.

Paper Structure

This paper contains 29 sections, 27 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Overview of the Lingshu-Cell framework. a, Lingshu-Cell employs a masked discrete diffusion model to learn and generate single-cell transcriptomic data. In the forward process, gene expression values are progressively masked (from $t = 0$ to $t = T$); in the reverse process, the model iteratively predicts masked values to generate realistic scRNA-seq expression profiles. b, Comparison of generative paradigms. Unlike autoregressive (AR) models that rely on a fixed sequential order and denoising diffusion probabilistic models (DDPMs) that corrupt all positions with continuous noise, Lingshu-Cell randomly masks and predicts gene expression values in an order-independent manner, which is inherently compatible with the orderless structure of gene expression data. c, Application scenarios of Lingshu-Cell, including unconditional generation across diverse human tissues and species, and conditional generation for genetic perturbation and cytokine perturbation response prediction.
  • Figure 2: Unconditional generation of cell states across diverse species and tissues by Lingshu-Cell. a, UMAP visualization of real and generated cells (10,000 each, randomly sampled) from the PARSE-PBMC dataset, colored by cell type annotation (left) and normalized expression (log1p) of canonical marker genes for each cell type. b, Comparison of cell type proportions between real and generated data. c, Quantitative benchmark comparing Lingshu-Cell, scDiffusion and scVI across five metrics (Pearson correlation, Spearman correlation, MMD, 1-WD and iLISI) on the PARSE-PBMC dataset. d, Unconditional generation results across human tissues, including neocortex, heart, lung and colon, with UMAP plots showing real (top) and generated (bottom) cells colored by cell type. e, Unconditional generation results across multiple species, including mouse, rhesus macaque, zebrafish and fly.
  • Figure 3: Unconditional generation performance of Lingshu-Cell across human tissues and non-human species.
  • Figure 4: Accurate prediction of single-cell transcriptomic responses to genetic perturbations in cell lines by Lingshu-Cell. a, Schematic of CRISPR-based genetic perturbation and the resulting transcriptomic changes. b, Conditional generation framework for perturbation prediction. Cell type and perturbation target are provided as conditioning inputs, and a masked diffusion model iteratively predicts gene expression values to generate perturbation-specific expression profiles. c, Three design components of Lingshu-Cell: classifier-free guidance (CFG), sequence compression, and biological prior injection (see Methods). d, Ablation study of CFG guidance weight. Bar plots show prediction performance across eight metrics (DES, PDS, MAE, Spearman #DEG, Spearman LFC, AUPRC, Pearson-$\Delta$, and average score) on the H1 test set ($n = 100$ perturbation targets). e, Ablation study of sequence compression, comparing uncompressed input with patch sizes of 8 and 32. f, Ablation study of biological prior injection, comparing prediction performance with and without prior injection. In d--f, metrics highlighted in red denote improved performance relative to the corresponding baseline in each ablation setting.
  • Figure 5: Genetic perturbation prediction on the VCC leaderboard roohani2025virtual. Teams are ordered by final ranking. Avg Rank: average rank across the top 25 teams. Best per column in bold. See \ref{['tab:vcc-leaderboard-full']} for the full top-25 ranking.
  • ...and 6 more figures