CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis
Yihang Xiao, Jinyi Liu, Yan Zheng, Xiaohan Xie, Jianye Hao, Mingzhi Li, Ruitao Wang, Fei Ni, Yuxiao Li, Jintian Luo, Shaoqing Jiao, Jiajie Peng
TL;DR
This work tackles the complexity and high skill barrier of single-cell RNA-seq data analysis by introducing CellAgent, an LLM-driven multi-agent framework that automates end-to-end workflows. It organizes three biological expert roles—Planner, Executor, and Evaluator—within a hierarchical planning and a self-iterative optimization loop to ensure high-quality, human-intervention-free results. Across batch correction, cell type annotation, and trajectory inference, CellAgent achieves robust performance, including a 92% task completion rate and superior metrics on multiple datasets, often surpassing single-model baselines. The approach significantly reduces the manual workload in scRNA-seq analysis and opens avenues for broader adoption of AI-assisted science while highlighting areas for future enhancements such as diversified self-evaluation strategies and tool integration.
Abstract
Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research, as it enables the precise characterization of cellular heterogeneity. However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers. To address this, we introduce CellAgent (http://cell.agent4science.cn/), an LLM-driven multi-agent framework, specifically designed for the automatic processing and execution of scRNA-seq data analysis tasks, providing high-quality results with no human intervention. Firstly, to adapt general LLMs to the biological field, CellAgent constructs LLM-driven biological expert roles - planner, executor, and evaluator - each with specific responsibilities. Then, CellAgent introduces a hierarchical decision-making mechanism to coordinate these biological experts, effectively driving the planning and step-by-step execution of complex data analysis tasks. Furthermore, we propose a self-iterative optimization mechanism, enabling CellAgent to autonomously evaluate and optimize solutions, thereby guaranteeing output quality. We evaluate CellAgent on a comprehensive benchmark dataset encompassing dozens of tissues and hundreds of distinct cell types. Evaluation results consistently show that CellAgent effectively identifies the most suitable tools and hyperparameters for single-cell analysis tasks, achieving optimal performance. This automated framework dramatically reduces the workload for science data analyses, bringing us into the "Agent for Science" era.
