Table of Contents
Fetching ...

Agentic AI in Healthcare & Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM-based Agents

Shubham Vatsal, Harsh Dubey, Aditi Singh

TL;DR

The paper tackles the absence of a unified framework for evaluating agentic AI in healthcare by introducing a seven-dimensional taxonomy with 29 sub-dimensions and a consistent labeling rubric. It empirically analyzes 49 healthcare LLM-based agent studies, mapping them to the taxonomy to reveal prevalence and co-occurrence patterns, such as strong External Knowledge Integration but weak Dynamic Updates & Forgetting, and dominant Multi-Agent Design with partial centralized orchestration. The findings show information-centric tasks like documentation and QA are more mature than action-oriented tasks like Treatment Planning, highlighting gaps in safety, adaptation, and governance necessary for clinical deployment. The work provides a structured baseline to guide future research toward reliable, ethical, and scalable Agentic AI in healthcare, with implications for design choices, evaluation standards, and regulatory compliance.

Abstract

Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR analysis and differential diagnosis to treatment planning and research workflows. Yet the literature largely consists of overviews which are either broad surveys or narrow dives into a single capability (e.g., memory, planning, reasoning), leaving healthcare work without a common frame. We address this by reviewing 49 studies using a seven-dimensional taxonomy: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and Core Tasks & Subtasks with 29 operational sub-dimensions. Using explicit inclusion and exclusion criteria and a labeling rubric (Fully Implemented, Partially Implemented, Not Implemented), we map each study to the taxonomy and report quantitative summaries of capability prevalence and co-occurrence patterns. Our empirical analysis surfaces clear asymmetries. For instance, the External Knowledge Integration sub-dimension under Knowledge Management is commonly realized (~76% Fully Implemented) whereas Event-Triggered Activation sub-dimenison under Interaction Patterns is largely absent (~92% Not Implemented) and Drift Detection & Mitigation sub-dimension under Adaptation & Learning is rare (~98% Not Implemented). Architecturally, Multi-Agent Design sub-dimension under Framework Typology is the dominant pattern (~82% Fully Implemented) while orchestration layers remain mostly partial. Across Core Tasks & Subtasks, information centric capabilities lead e.g., Medical Question Answering & Decision Support and Benchmarking & Simulation, while action and discovery oriented areas such as Treatment Planning & Prescription still show substantial gaps (~59% Not Implemented).

Agentic AI in Healthcare & Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM-based Agents

TL;DR

The paper tackles the absence of a unified framework for evaluating agentic AI in healthcare by introducing a seven-dimensional taxonomy with 29 sub-dimensions and a consistent labeling rubric. It empirically analyzes 49 healthcare LLM-based agent studies, mapping them to the taxonomy to reveal prevalence and co-occurrence patterns, such as strong External Knowledge Integration but weak Dynamic Updates & Forgetting, and dominant Multi-Agent Design with partial centralized orchestration. The findings show information-centric tasks like documentation and QA are more mature than action-oriented tasks like Treatment Planning, highlighting gaps in safety, adaptation, and governance necessary for clinical deployment. The work provides a structured baseline to guide future research toward reliable, ethical, and scalable Agentic AI in healthcare, with implications for design choices, evaluation standards, and regulatory compliance.

Abstract

Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR analysis and differential diagnosis to treatment planning and research workflows. Yet the literature largely consists of overviews which are either broad surveys or narrow dives into a single capability (e.g., memory, planning, reasoning), leaving healthcare work without a common frame. We address this by reviewing 49 studies using a seven-dimensional taxonomy: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and Core Tasks & Subtasks with 29 operational sub-dimensions. Using explicit inclusion and exclusion criteria and a labeling rubric (Fully Implemented, Partially Implemented, Not Implemented), we map each study to the taxonomy and report quantitative summaries of capability prevalence and co-occurrence patterns. Our empirical analysis surfaces clear asymmetries. For instance, the External Knowledge Integration sub-dimension under Knowledge Management is commonly realized (~76% Fully Implemented) whereas Event-Triggered Activation sub-dimenison under Interaction Patterns is largely absent (~92% Not Implemented) and Drift Detection & Mitigation sub-dimension under Adaptation & Learning is rare (~98% Not Implemented). Architecturally, Multi-Agent Design sub-dimension under Framework Typology is the dominant pattern (~82% Fully Implemented) while orchestration layers remain mostly partial. Across Core Tasks & Subtasks, information centric capabilities lead e.g., Medical Question Answering & Decision Support and Benchmarking & Simulation, while action and discovery oriented areas such as Treatment Planning & Prescription still show substantial gaps (~59% Not Implemented).
Paper Structure (50 sections, 11 figures, 8 tables)

This paper contains 50 sections, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Timeline Diagram of All the Papers Evaluated in Our Work. Refer to Table \ref{['tab:cognitive']} for Mapping Between Paper Names and Corresponding Citations
  • Figure 2: PRISMA Flow Diagram of the Study Selection Process
  • Figure 3: Taxonomy Diagram of Dimensions Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and their Corresponding Sub-Dimensions with Research Papers Rated Fully Implemented. Refer to Table \ref{['tab:cognitive']} for Mapping Between Paper Names and Corresponding Citations.
  • Figure 4: Taxonomy Diagram of Dimension Core Tasks & Subtasks and it's Corresponding Sub-Dimensions with Research Papers Rated Fully Implemented. Refer to Table \ref{['tab:cognitive']} for Mapping Between Paper Names and Corresponding Citations.
  • Figure 5: Distribution of Labels Across Sub-Dimensions of Cognitive Capabilities
  • ...and 6 more figures