VCWorld: A Biological World Model for Virtual Cell Simulation
Zhijian Wei, Runze Ma, Zichen Wang, Zhongmin Li, Shuotong Song, Shuangjia Zheng
TL;DR
VCWorld presents a data-efficient, white-box approach to predicting cellular responses by integrating structured biological knowledge with LLM-based reasoning. It reframes perturbation prediction as gene-centric tasks and employs an open-world knowledge graph with retrieval-augmented, chain-of-thought reasoning to produce interpretable predictions and mechanistic hypotheses. The GeneTAK benchmark enables robust evaluation of gene-level perturbation effects, with VCWorld achieving state-of-the-art performance on DE and DIR tasks and showing strong case-study alignment with biological evidence. The work emphasizes interpretability and biological grounding, offering a path toward credible in silico cellular perturbation modeling and hypothesis generation.
Abstract
Virtual cell modeling aims to predict cellular responses to perturbations. Existing virtual cell models rely heavily on large-scale single-cell datasets, learning explicit mappings between gene expression and perturbations. Although recent models attempt to incorporate multi-source biological information, their generalization remains constrained by data quality, coverage, and batch effects. More critically, these models often function as black boxes, offering predictions without interpretability or consistency with biological principles, which undermines their credibility in scientific research. To address these challenges, we present VCWorld, a cell-level white-box simulator that integrates structured biological knowledge with the iterative reasoning capabilities of large language models to instantiate a biological world model. VCWorld operates in a data-efficient manner to reproduce perturbation-induced signaling cascades and generates interpretable, stepwise predictions alongside explicit mechanistic hypotheses. In drug perturbation benchmarks, VCWorld achieves state-of-the-art predictive performance, and the inferred mechanistic pathways are consistent with publicly available biological evidence.
