Table of Contents
Fetching ...

From Passive to Proactive: A Multi-Agent System with Dynamic Task Orchestration for Intelligent Medical Pre-Consultation

ChengZhang Yu, YingRu He, Hongyan Cheng, nuo Cheng, Zhixing Liu, Dongxu Mu, Zhangrui Shen, Zhanpeng Jin

TL;DR

The paper tackles the challenge of extremely short primary care visits by transforming passive AI pre-consultation into proactive, structured inquiry using a hierarchical eight-agent system coordinated by a central Controller. It decomposes pre-consultation into four primary tasks (T1–T4) with 13 domain-specific subtasks and validates a model-agnostic, privacy-preserving architecture on 1,372 real EHRs across multiple foundation models, achieving high triage accuracy and near-complete task completion. Key contributions include dynamic subtask evaluation, adaptive prompt generation, and hierarchical task management that balance macro-diagnostic progression with micro-level symptom collection, demonstrated through rigorous ablations and real-world physician evaluation. The approach promises to reduce physician workload while maintaining, or improving, pre-consultation quality, enabling scalable, autonomous AI-driven intake in clinical settings.

Abstract

Global healthcare systems face critical challenges from increasing patient volumes and limited consultation times, with primary care visits averaging under 5 minutes in many countries. While pre-consultation processes encompassing triage and structured history-taking offer potential solutions, they remain limited by passive interaction paradigms and context management challenges in existing AI systems. This study introduces a hierarchical multi-agent framework that transforms passive medical AI systems into proactive inquiry agents through autonomous task orchestration. We developed an eight-agent architecture with centralized control mechanisms that decomposes pre-consultation into four primary tasks: Triage ($T_1$), History of Present Illness collection ($T_2$), Past History collection ($T_3$), and Chief Complaint generation ($T_4$), with $T_1$--$T_3$ further divided into 13 domain-specific subtasks. Evaluated on 1,372 validated electronic health records from a Chinese medical platform across multiple foundation models (GPT-OSS 20B, Qwen3-8B, Phi4-14B), the framework achieved 87.0% accuracy for primary department triage and 80.5% for secondary department classification, with task completion rates reaching 98.2% using agent-driven scheduling versus 93.1% with sequential processing. Clinical quality scores from 18 physicians averaged 4.56 for Chief Complaints, 4.48 for History of Present Illness, and 4.69 for Past History on a 5-point scale, with consultations completed within 12.7 rounds for $T_2$ and 16.9 rounds for $T_3$. The model-agnostic architecture maintained high performance across different foundation models while preserving data privacy through local deployment, demonstrating the potential for autonomous AI systems to enhance pre-consultation efficiency and quality in clinical settings.

From Passive to Proactive: A Multi-Agent System with Dynamic Task Orchestration for Intelligent Medical Pre-Consultation

TL;DR

The paper tackles the challenge of extremely short primary care visits by transforming passive AI pre-consultation into proactive, structured inquiry using a hierarchical eight-agent system coordinated by a central Controller. It decomposes pre-consultation into four primary tasks (T1–T4) with 13 domain-specific subtasks and validates a model-agnostic, privacy-preserving architecture on 1,372 real EHRs across multiple foundation models, achieving high triage accuracy and near-complete task completion. Key contributions include dynamic subtask evaluation, adaptive prompt generation, and hierarchical task management that balance macro-diagnostic progression with micro-level symptom collection, demonstrated through rigorous ablations and real-world physician evaluation. The approach promises to reduce physician workload while maintaining, or improving, pre-consultation quality, enabling scalable, autonomous AI-driven intake in clinical settings.

Abstract

Global healthcare systems face critical challenges from increasing patient volumes and limited consultation times, with primary care visits averaging under 5 minutes in many countries. While pre-consultation processes encompassing triage and structured history-taking offer potential solutions, they remain limited by passive interaction paradigms and context management challenges in existing AI systems. This study introduces a hierarchical multi-agent framework that transforms passive medical AI systems into proactive inquiry agents through autonomous task orchestration. We developed an eight-agent architecture with centralized control mechanisms that decomposes pre-consultation into four primary tasks: Triage (), History of Present Illness collection (), Past History collection (), and Chief Complaint generation (), with -- further divided into 13 domain-specific subtasks. Evaluated on 1,372 validated electronic health records from a Chinese medical platform across multiple foundation models (GPT-OSS 20B, Qwen3-8B, Phi4-14B), the framework achieved 87.0% accuracy for primary department triage and 80.5% for secondary department classification, with task completion rates reaching 98.2% using agent-driven scheduling versus 93.1% with sequential processing. Clinical quality scores from 18 physicians averaged 4.56 for Chief Complaints, 4.48 for History of Present Illness, and 4.69 for Past History on a 5-point scale, with consultations completed within 12.7 rounds for and 16.9 rounds for . The model-agnostic architecture maintained high performance across different foundation models while preserving data privacy through local deployment, demonstrating the potential for autonomous AI systems to enhance pre-consultation efficiency and quality in clinical settings.

Paper Structure

This paper contains 28 sections, 4 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Hierarchical multi-agent framework architecture for medical consultation workflow. Parts 1-3 correspond to Sec. \ref{['Task Scheduling and Monitoring']}, \ref{['Information Update and Question Generation']}, \ref{['Evaluation']} respectively.
  • Figure 2: Performance comparison across different foundation models. Subfigures (a)-(c) show results for GPT-OSS 20B, Qwen3-8B, and Phi4-14B respectively. Subfigure (d) illustrates the distribution of incomplete samples across all three models.
  • Figure 3: Triage accuracy across different medical departments. The bar chart shows significant variation in classification performance, with Ophthalmology achieving the highest accuracy (100.0%) and Psychiatry the lowest (30.0%).
  • Figure 4: Mean scores and variance across seven evaluation dimensions for GPT-OSS 20B, Qwen3-8B, and Phi4-14B. Error bars indicate standard deviation.
  • Figure 5: Task completion rates under different scheduling strategies. Our proposed Agent Driven strategy demonstrates superior completion performance compared to the baseline Medical Priority approach.
  • ...and 2 more figures