Table of Contents
Fetching ...

HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research

Yinghao Zhu, Yifan Qi, Zixiang Wang, Lei Gu, Dehao Sui, Haoran Hu, Xichen Zhang, Ziyi He, Junjun He, Liantao Ma, Lequan Yu

TL;DR

HealthFlow introduces a self-evolving AI agent driven by meta-level planning to autonomously conduct healthcare research. The architecture comprises four specialized agents—meta (strategic planner), executor (execution engine), evaluator (short-term corrector), and reflector (long-term knowledge synthesizer)—and a persistent experience memory that continuously evolves the agent's high-level strategies. The authors also present EHRFlowBench, a domain-grounded benchmark with 110 tasks derived from peer-reviewed literature to rigorously evaluate autonomous healthcare research workflows. Empirical results show HealthFlow outperforms state-of-the-art agent frameworks on open-ended healthcare research tasks and maintain competitive performance on more knowledge-intensive, tool-light benchmarks, underscoring the value of learning how to plan and manage research workflows rather than merely executing tasks. The work advances a paradigm in which intelligent systems operationalize procedural knowledge embedded in scientific content, paving the way for more autonomous and effective AI-assisted healthcare research while acknowledging the need for human-in-the-loop safeguards and ethical considerations.

Abstract

The rapid proliferation of scientific knowledge presents a grand challenge: transforming this vast repository of information into an active engine for discovery, especially in high-stakes domains like healthcare. Current AI agents, however, are constrained by static, predefined strategies, limiting their ability to navigate the complex, evolving ecosystem of scientific research. This paper introduces HealthFlow, a self-evolving AI agent that overcomes this limitation through a novel meta-level evolution mechanism. HealthFlow autonomously refines its high-level problem-solving policies by distilling procedural successes and failures into a durable, structured knowledge base, enabling it to learn not just how to use tools, but how to strategize. To anchor our research and provide a community resource, we introduce EHRFlowBench, a new benchmark featuring complex health data analysis tasks systematically derived from peer-reviewed scientific literature. Our experiments demonstrate that HealthFlow's self-evolving approach significantly outperforms state-of-the-art agent frameworks. This work offers a new paradigm for intelligent systems that can learn to operationalize the procedural knowledge embedded in scientific content, marking a critical step toward more autonomous and effective AI for healthcare scientific discovery.

HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research

TL;DR

HealthFlow introduces a self-evolving AI agent driven by meta-level planning to autonomously conduct healthcare research. The architecture comprises four specialized agents—meta (strategic planner), executor (execution engine), evaluator (short-term corrector), and reflector (long-term knowledge synthesizer)—and a persistent experience memory that continuously evolves the agent's high-level strategies. The authors also present EHRFlowBench, a domain-grounded benchmark with 110 tasks derived from peer-reviewed literature to rigorously evaluate autonomous healthcare research workflows. Empirical results show HealthFlow outperforms state-of-the-art agent frameworks on open-ended healthcare research tasks and maintain competitive performance on more knowledge-intensive, tool-light benchmarks, underscoring the value of learning how to plan and manage research workflows rather than merely executing tasks. The work advances a paradigm in which intelligent systems operationalize procedural knowledge embedded in scientific content, paving the way for more autonomous and effective AI-assisted healthcare research while acknowledging the need for human-in-the-loop safeguards and ethical considerations.

Abstract

The rapid proliferation of scientific knowledge presents a grand challenge: transforming this vast repository of information into an active engine for discovery, especially in high-stakes domains like healthcare. Current AI agents, however, are constrained by static, predefined strategies, limiting their ability to navigate the complex, evolving ecosystem of scientific research. This paper introduces HealthFlow, a self-evolving AI agent that overcomes this limitation through a novel meta-level evolution mechanism. HealthFlow autonomously refines its high-level problem-solving policies by distilling procedural successes and failures into a durable, structured knowledge base, enabling it to learn not just how to use tools, but how to strategize. To anchor our research and provide a community resource, we introduce EHRFlowBench, a new benchmark featuring complex health data analysis tasks systematically derived from peer-reviewed scientific literature. Our experiments demonstrate that HealthFlow's self-evolving approach significantly outperforms state-of-the-art agent frameworks. This work offers a new paradigm for intelligent systems that can learn to operationalize the procedural knowledge embedded in scientific content, marking a critical step toward more autonomous and effective AI for healthcare scientific discovery.

Paper Structure

This paper contains 81 sections, 10 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: The self-evolving architecture of HealthFlow. The framework operates in a continuous learning loop. (1) A task is received by the meta agent, which generates a strategic plan by retrieving relevant past experiences. (2) The executor agent executes this plan using tools, producing results and detailed logs. (3) The evaluator agent assesses the execution, providing scores and feedback for immediate, short-term correction. (4) Upon successful completion, the reflector agent analyzes the entire process to synthesize abstract, structured experience (e.g., heuristics, workflow patterns). Experiences are stored in a persistent memory, augmenting the meta agent's strategic capabilities for future tasks and enabling the system's long-term, meta-level evolution.
  • Figure 2: Task category distribution in EHRFlowBench. The initial distribution of 585 LLM-extracted tasks (left) is refined through manual curation and stratified sampling into a final set of 110 tasks across 10 core research categories (right), with irrelevant categories like "ablation study" being discarded.
  • Figure 3: Head-to-head performance of HealthFlow against leading agent frameworks on (a) EHRFlowBench and (b) MedAgentBoard. Each bar shows the distribution for all tasks in a direct comparison against a specific baseline. Outcomes are categorized as a tie if the score difference is $\leq 0.25$.
  • Figure 4: Distribution of experience synthesis and retrieval across EHRFlowBench and MedAgentBoard benchmarks.
  • Figure 5: Distribution of synthesized experience categories.
  • ...and 5 more figures