Table of Contents
Fetching ...

Small Agent Group is the Future of Digital Health

Yuqiao Meng, Luoxi Tang, Dazheng Zhang, Rafael Brens, Elvys J. Romero, Nancy Guo, Safa Elkefi, Zhaohan Xi

TL;DR

The paper tackles the limitations of the scaling-first approach in digital health by proposing a Small Agent Group (SAG) that distributes clinical reasoning across specialized, collaborative agents. SAG uses a multi-agent debate framework with distinct roles for reasoning, knowledge retrieval, safety auditing, and synthesis, enhanced by retrieval-augmented generation and centralized training with decentralized execution. Across diverse benchmarks, SAG delivers superior diagnostic accuracy, evidence grounding, safety, robustness, and fairness, while offering a more deployment-friendly resource profile than giant LLMs. The work demonstrates that collaborative small-model reasoning can substitute for parameter growth, providing a scalable, reliable, and practically deployable path for AI-powered clinical decision support.

Abstract

The rapid adoption of large language models (LLMs) in digital health has been driven by a "scaling-first" philosophy, i.e., the assumption that clinical intelligence increases with model size and data. However, real-world clinical needs include not only effectiveness, but also reliability and reasonable deployment cost. Since clinical decision-making is inherently collaborative, we challenge the monolithic scaling paradigm and ask whether a Small Agent Group (SAG) can support better clinical reasoning. SAG shifts from single-model intelligence to collective expertise by distributing reasoning, evidence-based analysis, and critical audit through a collaborative deliberation process. To assess the clinical utility of SAG, we conduct extensive evaluations using diverse clinical metrics spanning effectiveness, reliability, and deployment cost. Our results show that SAG achieves superior performance compared to a single giant model, both with and without additional optimization or retrieval-augmented generation. These findings suggest that the synergistic reasoning represented by SAG can substitute for model parameter growth in clinical settings. Overall, SAG offers a scalable solution to digital health that better balances effectiveness, reliability, and deployment efficiency.

Small Agent Group is the Future of Digital Health

TL;DR

The paper tackles the limitations of the scaling-first approach in digital health by proposing a Small Agent Group (SAG) that distributes clinical reasoning across specialized, collaborative agents. SAG uses a multi-agent debate framework with distinct roles for reasoning, knowledge retrieval, safety auditing, and synthesis, enhanced by retrieval-augmented generation and centralized training with decentralized execution. Across diverse benchmarks, SAG delivers superior diagnostic accuracy, evidence grounding, safety, robustness, and fairness, while offering a more deployment-friendly resource profile than giant LLMs. The work demonstrates that collaborative small-model reasoning can substitute for parameter growth, providing a scalable, reliable, and practically deployable path for AI-powered clinical decision support.

Abstract

The rapid adoption of large language models (LLMs) in digital health has been driven by a "scaling-first" philosophy, i.e., the assumption that clinical intelligence increases with model size and data. However, real-world clinical needs include not only effectiveness, but also reliability and reasonable deployment cost. Since clinical decision-making is inherently collaborative, we challenge the monolithic scaling paradigm and ask whether a Small Agent Group (SAG) can support better clinical reasoning. SAG shifts from single-model intelligence to collective expertise by distributing reasoning, evidence-based analysis, and critical audit through a collaborative deliberation process. To assess the clinical utility of SAG, we conduct extensive evaluations using diverse clinical metrics spanning effectiveness, reliability, and deployment cost. Our results show that SAG achieves superior performance compared to a single giant model, both with and without additional optimization or retrieval-augmented generation. These findings suggest that the synergistic reasoning represented by SAG can substitute for model parameter growth in clinical settings. Overall, SAG offers a scalable solution to digital health that better balances effectiveness, reliability, and deployment efficiency.
Paper Structure (62 sections, 12 equations, 7 figures, 10 tables)

This paper contains 62 sections, 12 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Illustration of clinical reasoning performance between a single LLM and a collaborative agent group.
  • Figure 2: Overview of the outlined SAG architecture and workflow. To be representative, we combine a symmetric agent debate (between $A_R$ and $A_K$) with sequential execution (by $A_S$ and $A_J$) to align with clinical needs for real-world evidence retrieval, safety checking, and final judgment. We further adopt multi-round iterative execution to ensure information flows through the entire SAG system. To control latency, we adopt early termination when no further substantive arguments are raised during the agent discussions.
  • Figure 3: Clinical relevance landscape (Qwen-based). Circles denote clinical baseline models, squares denote Qwen models, and stars denote our SAG variants. The corresponding Llama-based results are provided in the appendix \ref{['app:relevance']}.
  • Figure 4: Safety ROC analysis. True positive rate (harm refusal) vs. false positive rate (over-refusal). Solid lines denote SAG variants.
  • Figure 5: Robustness analysis. Performance under Clean (original), Ling. (linguistic noise), and Adv. (adversarial distractors).
  • ...and 2 more figures