CellMaster: Collaborative Cell Type Annotation in Single-Cell Analysis
Zhen Wang, Yiming Gao, Jieyuan Liu, Enze Ma, Jefferson Chen, Mark Antkowiak, Mengzhou Hu, JungHo Kong, Dexter Pratt, Zhiting Hu, Wei Wang, Trey Ideker, Eric P. Xing
TL;DR
CellMaster addresses the bottleneck of cell-type annotation in large-scale scRNA-seq by introducing a zero-shot, LLM-driven agent that mimics expert reasoning and provides interpretable rationales. Its iterative four-stage pipeline—hypothesis generation, marker selection, expression analysis, and result evaluation—supports both automatic and human-in-the-loop modes, yielding substantial accuracy gains across diverse tissues. In benchmarking across nine datasets, CellMaster achieves an average improvement of about 7.1% over strong baselines in automatic mode and up to 18.6% with human feedback, while handling rare and novel cell states more robustly. The work highlights the value of transparent AI-assisted workflows with provenance-tracked human collaboration for scalable, biologically faithful annotation in evolving single-cell atlases, and provides open-source tools for broad adoption.
Abstract
Single-cell RNA-seq (scRNA-seq) enables atlas-scale profiling of complex tissues, revealing rare lineages and transient states. Yet, assigning biologically valid cell identities remains a bottleneck because markers are tissue- and state-dependent, and novel states lack references. We present CellMaster, an AI agent that mimics expert practice for zero-shot cell-type annotation. Unlike existing automated tools, CellMaster leverages LLM-encoded knowledge (e.g., GPT-4o) to perform on-the-fly annotation with interpretable rationales, without pre-training or fixed marker databases. Across 9 datasets spanning 8 tissues, CellMaster improved accuracy by 7.1% over best-performing baselines (including CellTypist and scTab) in automatic mode. With human-in-the-loop refinement, this advantage increased to 18.6%, with a 22.1% gain on subtype populations. The system demonstrates particular strength in rare and novel cell states where baselines often fail. Source code and the web application are available at \href{https://github.com/AnonymousGym/CellMaster}{https://github.com/AnonymousGym/CellMaster}.
