HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research
Yinghao Zhu, Yifan Qi, Zixiang Wang, Lei Gu, Dehao Sui, Haoran Hu, Xichen Zhang, Ziyi He, Junjun He, Liantao Ma, Lequan Yu
TL;DR
HealthFlow introduces a self-evolving AI agent driven by meta-level planning to autonomously conduct healthcare research. The architecture comprises four specialized agents—meta (strategic planner), executor (execution engine), evaluator (short-term corrector), and reflector (long-term knowledge synthesizer)—and a persistent experience memory that continuously evolves the agent's high-level strategies. The authors also present EHRFlowBench, a domain-grounded benchmark with 110 tasks derived from peer-reviewed literature to rigorously evaluate autonomous healthcare research workflows. Empirical results show HealthFlow outperforms state-of-the-art agent frameworks on open-ended healthcare research tasks and maintain competitive performance on more knowledge-intensive, tool-light benchmarks, underscoring the value of learning how to plan and manage research workflows rather than merely executing tasks. The work advances a paradigm in which intelligent systems operationalize procedural knowledge embedded in scientific content, paving the way for more autonomous and effective AI-assisted healthcare research while acknowledging the need for human-in-the-loop safeguards and ethical considerations.
Abstract
The rapid proliferation of scientific knowledge presents a grand challenge: transforming this vast repository of information into an active engine for discovery, especially in high-stakes domains like healthcare. Current AI agents, however, are constrained by static, predefined strategies, limiting their ability to navigate the complex, evolving ecosystem of scientific research. This paper introduces HealthFlow, a self-evolving AI agent that overcomes this limitation through a novel meta-level evolution mechanism. HealthFlow autonomously refines its high-level problem-solving policies by distilling procedural successes and failures into a durable, structured knowledge base, enabling it to learn not just how to use tools, but how to strategize. To anchor our research and provide a community resource, we introduce EHRFlowBench, a new benchmark featuring complex health data analysis tasks systematically derived from peer-reviewed scientific literature. Our experiments demonstrate that HealthFlow's self-evolving approach significantly outperforms state-of-the-art agent frameworks. This work offers a new paradigm for intelligent systems that can learn to operationalize the procedural knowledge embedded in scientific content, marking a critical step toward more autonomous and effective AI for healthcare scientific discovery.
