Table of Contents
Fetching ...

LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

Sajib Acharjee Dip, Adrika Zafor, Bikash Kumar Paul, Uddip Acharjee Shuvo, Muhit Islam Emon, Xuan Wang, Liqing Zhang

TL;DR

LLM4Cell addresses the fragmentation in large-language and agentic models for single-cell biology by surveying 58 methods across five families and eight analytical tasks, grounded in 40+ public datasets. It provides a unified taxonomy and a ten-dimension domain rubric to evaluate biological grounding, fairness, and scalability, while discussing open challenges in cross-modal integration, interpretability, and trustworthy AI. The work highlights a shift from purely statistical representations to language-grounded, interpretable, and increasingly autonomous single-cell intelligence, and it offers a reproducible reference to benchmark, compare, and guide future model design. Together, these contributions establish a foundation for standardized cross-modal benchmarking, cross-species generalization, and responsible development of cellular foundation and agentic models.

Abstract

Large language models (LLMs) and emerging agentic frameworks are beginning to transform single-cell biology by enabling natural-language reasoning, generative annotation, and multimodal data integration. However, progress remains fragmented across data modalities, architectures, and evaluation standards. LLM4Cell presents the first unified survey of 58 foundation and agentic models developed for single-cell research, spanning RNA, ATAC, multi-omic, and spatial modalities. We categorize these methods into five families-foundation, text-bridge, spatial, multimodal, epigenomic, and agentic-and map them to eight key analytical tasks including annotation, trajectory and perturbation modeling, and drug-response prediction. Drawing on over 40 public datasets, we analyze benchmark suitability, data diversity, and ethical or scalability constraints, and evaluate models across 10 domain dimensions covering biological grounding, multi-omics alignment, fairness, privacy, and explainability. By linking datasets, models, and evaluation domains, LLM4Cell provides the first integrated view of language-driven single-cell intelligence and outlines open challenges in interpretability, standardization, and trustworthy model development.

LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

TL;DR

LLM4Cell addresses the fragmentation in large-language and agentic models for single-cell biology by surveying 58 methods across five families and eight analytical tasks, grounded in 40+ public datasets. It provides a unified taxonomy and a ten-dimension domain rubric to evaluate biological grounding, fairness, and scalability, while discussing open challenges in cross-modal integration, interpretability, and trustworthy AI. The work highlights a shift from purely statistical representations to language-grounded, interpretable, and increasingly autonomous single-cell intelligence, and it offers a reproducible reference to benchmark, compare, and guide future model design. Together, these contributions establish a foundation for standardized cross-modal benchmarking, cross-species generalization, and responsible development of cellular foundation and agentic models.

Abstract

Large language models (LLMs) and emerging agentic frameworks are beginning to transform single-cell biology by enabling natural-language reasoning, generative annotation, and multimodal data integration. However, progress remains fragmented across data modalities, architectures, and evaluation standards. LLM4Cell presents the first unified survey of 58 foundation and agentic models developed for single-cell research, spanning RNA, ATAC, multi-omic, and spatial modalities. We categorize these methods into five families-foundation, text-bridge, spatial, multimodal, epigenomic, and agentic-and map them to eight key analytical tasks including annotation, trajectory and perturbation modeling, and drug-response prediction. Drawing on over 40 public datasets, we analyze benchmark suitability, data diversity, and ethical or scalability constraints, and evaluate models across 10 domain dimensions covering biological grounding, multi-omics alignment, fairness, privacy, and explainability. By linking datasets, models, and evaluation domains, LLM4Cell provides the first integrated view of language-driven single-cell intelligence and outlines open challenges in interpretability, standardization, and trustworthy model development.

Paper Structure

This paper contains 60 sections, 5 figures.

Figures (5)

  • Figure 1: Hierarchical taxonomy for LLM4Cell. Color-coded families expand into sub-branches and representative models, tracing the progression from foundation pretraining to multimodal and agentic reasoning frameworks. References are omitted for visibility and included in the appendix method comparison table.
  • Figure 2: Task and domain distribution across spatial transcriptomics models. Most emphasize biological grounding, scalability, and annotation tasks.
  • Figure 10: Task vs Model Heatmap
  • Figure 11: Comparison of task coverage between agentic (dark blue) and non-agentic (yellow) models. Agentic frameworks emphasize annotation, ontology mapping, spatial mapping while non-agentic models concentrate on trajectory, perturbation modeling, regulatory and pathway inference as well.
  • Figure 12: Comparison of domain coverage between agentic (dark blue) and non-agentic (yellow) models. Agentic frameworks emphasize explainability, fairness, and emerging paradigms, while non-agentic models concentrate on biological grounding and batch effects.