Table of Contents
Fetching ...

CellForge: Agentic Design of Virtual Cell Models

Xiangru Tang, Zhuoyun Yu, Jiapeng Chen, Yan Cui, Daniel Shao, Weixu Wang, Fang Wu, Yuchen Zhuang, Wenqi Shi, Zhi Huang, Arman Cohan, Xihong Lin, Fabian Theis, Smita Krishnaswamy, Mark Gerstein

TL;DR

CellForge introduces a fully autonomous, multi-agent framework to design and implement executable neural architectures for single-cell perturbation prediction. By coupling Task Analysis, Design Module with graph-based expert collaboration, and Experiment Execution for end-to-end code generation and validation, it demonstrates competitive predictive performance across six Perturb-seq datasets and reveals novel architectural motifs such as trajectory-aware encoders and perturbation diffusion. The work emphasizes knowledge-grounded retrieval, rigorous task formulations, and automated biological and methodological validation, marking a paradigm shift toward autonomous scientific method development in computational biology. Collectively, CellForge demonstrates that collaborative agentic design can produce high-quality, executable methods that outperform or match human-designed baselines while offering interpretability through its novel components and evaluation framework.

Abstract

Virtual cell modeling aims to predict cellular responses to diverse perturbations but faces challenges from biological complexity, multimodal data heterogeneity, and the need for interdisciplinary expertise. We introduce CellForge, a multi-agent framework that autonomously designs and synthesizes neural network architectures tailored to specific single-cell datasets and perturbation tasks. Given raw multi-omics data and task descriptions, CellForge discovers candidate architectures through collaborative reasoning among specialized agents, then generates executable implementations. Our core contribution is the framework itself: showing that multi-agent collaboration mechanisms - rather than manual human design or single-LLM prompting - can autonomously produce executable, high-quality computational methods. This approach goes beyond conventional hyperparameter tuning by enabling entirely new architectural components such as trajectory-aware encoders and perturbation diffusion modules to emerge from agentic deliberation. We evaluate CellForge on six datasets spanning gene knockouts, drug treatments, and cytokine stimulations across multiple modalities (scRNA-seq, scATAC-seq, CITE-seq). The results demonstrate that the models generated by CellForge are highly competitive with established baselines, while revealing systematic patterns of architectural innovation. CellForge highlights the scientific value of multi-agent frameworks: collaboration among specialized agents enables genuine methodological innovation and executable solutions that single agents or human experts cannot achieve. This represents a paradigm shift toward autonomous scientific method development in computational biology. Code is available at https://github.com/gersteinlab/CellForge.

CellForge: Agentic Design of Virtual Cell Models

TL;DR

CellForge introduces a fully autonomous, multi-agent framework to design and implement executable neural architectures for single-cell perturbation prediction. By coupling Task Analysis, Design Module with graph-based expert collaboration, and Experiment Execution for end-to-end code generation and validation, it demonstrates competitive predictive performance across six Perturb-seq datasets and reveals novel architectural motifs such as trajectory-aware encoders and perturbation diffusion. The work emphasizes knowledge-grounded retrieval, rigorous task formulations, and automated biological and methodological validation, marking a paradigm shift toward autonomous scientific method development in computational biology. Collectively, CellForge demonstrates that collaborative agentic design can produce high-quality, executable methods that outperform or match human-designed baselines while offering interpretability through its novel components and evaluation framework.

Abstract

Virtual cell modeling aims to predict cellular responses to diverse perturbations but faces challenges from biological complexity, multimodal data heterogeneity, and the need for interdisciplinary expertise. We introduce CellForge, a multi-agent framework that autonomously designs and synthesizes neural network architectures tailored to specific single-cell datasets and perturbation tasks. Given raw multi-omics data and task descriptions, CellForge discovers candidate architectures through collaborative reasoning among specialized agents, then generates executable implementations. Our core contribution is the framework itself: showing that multi-agent collaboration mechanisms - rather than manual human design or single-LLM prompting - can autonomously produce executable, high-quality computational methods. This approach goes beyond conventional hyperparameter tuning by enabling entirely new architectural components such as trajectory-aware encoders and perturbation diffusion modules to emerge from agentic deliberation. We evaluate CellForge on six datasets spanning gene knockouts, drug treatments, and cytokine stimulations across multiple modalities (scRNA-seq, scATAC-seq, CITE-seq). The results demonstrate that the models generated by CellForge are highly competitive with established baselines, while revealing systematic patterns of architectural innovation. CellForge highlights the scientific value of multi-agent frameworks: collaboration among specialized agents enables genuine methodological innovation and executable solutions that single agents or human experts cannot achieve. This represents a paradigm shift toward autonomous scientific method development in computational biology. Code is available at https://github.com/gersteinlab/CellForge.

Paper Structure

This paper contains 181 sections, 35 equations, 19 figures, 25 tables, 3 algorithms.

Figures (19)

  • Figure 1: (a) Perturbation prediction learns mappings from control cell states to post-perturbation states in high-dimensional expression space. (b) Models train on control-perturbed cell pairs across modalities (scRNA-seq, scATAC-seq, CITE-seq) to predict responses to unseen perturbations. (c)CellForge receives datasets and task descriptions, autonomously designing models for predicting expression under novel perturbations ($p_i \in \mathcal{P}_{\text{test}}$). (d) System workflow.
  • Figure 2: Multi-agent collaboration generates scientific research artifacts. Task Analysis produces dataset characterization and literature-grounded insights, Design Module synthesizes novel methodological approaches through structured agent discussions, and Experiment Execution demonstrates code generation capability.
  • Figure 3: The CellForge architecture and workflow.
  • Figure 4: The Graph-based discussion architecture and workflow. This is an example of two rounds of discussion from the beginning. After each round, confidence scores are updated, and the agentic system will judge if the current state satisfies the stopping criteria. If not, each expert will refine their ideas based on the critic agent's suggestions and other experts' viewpoints. This graph-based critic refinement continues until reaching the termination state. The figure includes an example formula for computing each expert’s confidence score per round, based on a weighted combination of historical scores, peer evaluations, and critic agent's assessments. Complete multi-rounds of discussions are presented in Appendix \ref{['app:graph_discussion']}.
  • Figure 5: Confidence Score Update in Graph-based Expert Discussion. This figure illustrates an example of how a domain expert’s confidence score evolves during iterative rounds of discussion in the Graph-based Expert Discussion framework. While this example focuses on the Model Architecture Expert, the same confidence updating process applies to all participating experts in the graph, each iteratively refining their proposals and adjusting their confidence based on multi-agent evaluations.
  • ...and 14 more figures