CureAgent: A Training-Free Executor-Analyst Framework for Clinical Reasoning
Ting-Ting Xie, Yixin Zhang
TL;DR
This work tackles context-utilization failures in clinical reasoning LLMs by introducing a training-free Executor-Analyst framework that decouples precise tool retrieval (Executor) from high-level reasoning (Analyst). A Stratified Ensemble topology preserves evidentiary diversity and reduces information bottlenecks, achieving state-of-the-art results on CURE-Bench without end-to-end fine-tuning. The study also reveals a Context-Performance Paradox—longer reasoning contexts can introduce noise—and the Curse of Dimensionality in expanding tool spaces, proposing hierarchical indexing and training-free adaptation as remedies. The framework demonstrates strong, scalable potential for trustworthy AI-driven clinical decision support, with code released for reproducibility.
Abstract
Current clinical agent built on small LLMs, such as TxAgent suffer from a \textit{Context Utilization Failure}, where models successfully retrieve biomedical evidence due to supervised finetuning but fail to ground their diagnosis in that information. In this work, we propose the Executor-Analyst Framework, a modular architecture that decouples the syntactic precision of tool execution from the semantic robustness of clinical reasoning. By orchestrating specialized TxAgents (Executors) with long-context foundation models (Analysts), we mitigate the reasoning deficits observed in monolithic models. Beyond simple modularity, we demonstrate that a Stratified Ensemble strategy significantly outperforms global pooling by preserving evidentiary diversity, effectively addressing the information bottleneck. Furthermore, our stress tests reveal critical scaling insights: (1) a \textit{Context-Performance Paradox}, where extending reasoning contexts beyond 12k tokens introduces noise that degrades accuracy; and (2) the \textit{Curse of Dimensionality} in action spaces, where expanding toolsets necessitates hierarchical retrieval strategies. Crucially, our approach underscores the potential of training-free architectural engineering, achieving state-of-the-art performance on CURE-Bench without the need for expensive end-to-end finetuning. This provides a scalable, agile foundation for the next generation of trustworthy AI-driven therapeutics. Code has been released on https://github.com/June01/CureAgent.
