Table of Contents
Fetching ...

Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery

Lin Fan, Pengyu Dai, Zhipeng Deng, Haolin Wang, Xun Gong, Yefeng Zheng, Yafei Ou

TL;DR

MACRO is proposed, a self-evolving, experience-augmented medical agent that shifts from static tool composition to experience-driven tool discovery, bridging the gap between brittle static tool use and adaptive, context-aware clinical AI assistance.

Abstract

Clinical image interpretation is inherently multi-step and tool-centric: clinicians iteratively combine visual evidence with patient context, quantify findings, and refine their decisions through a sequence of specialized procedures. While LLM-based agents promise to orchestrate such heterogeneous medical tools, existing systems treat tool sets and invocation strategies as static after deployment. This design is brittle under real-world domain shifts, across tasks, and evolving diagnostic requirements, where predefined tool chains frequently degrade and demand costly manual re-design. We propose MACRO, a self-evolving, experience-augmented medical agent that shifts from static tool composition to experience-driven tool discovery. From verified execution trajectories, the agent autonomously identifies recurring effective multi-step tool sequences, synthesizes them into reusable composite tools, and registers these as new high-level primitives that continuously expand its behavioral repertoire. A lightweight image-feature memory grounds tool selection in a visual-clinical context, while a GRPO-like training loop reinforces reliable invocation of discovered composites, enabling closed-loop self-improvement with minimal supervision. Extensive experiments across diverse medical imaging datasets and tasks demonstrate that autonomous composite tool discovery consistently improves multi-step orchestration accuracy and cross-domain generalization over strong baselines and recent state-of-the-art agentic methods, bridging the gap between brittle static tool use and adaptive, context-aware clinical AI assistance. Code will be available upon acceptance.

Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery

TL;DR

MACRO is proposed, a self-evolving, experience-augmented medical agent that shifts from static tool composition to experience-driven tool discovery, bridging the gap between brittle static tool use and adaptive, context-aware clinical AI assistance.

Abstract

Clinical image interpretation is inherently multi-step and tool-centric: clinicians iteratively combine visual evidence with patient context, quantify findings, and refine their decisions through a sequence of specialized procedures. While LLM-based agents promise to orchestrate such heterogeneous medical tools, existing systems treat tool sets and invocation strategies as static after deployment. This design is brittle under real-world domain shifts, across tasks, and evolving diagnostic requirements, where predefined tool chains frequently degrade and demand costly manual re-design. We propose MACRO, a self-evolving, experience-augmented medical agent that shifts from static tool composition to experience-driven tool discovery. From verified execution trajectories, the agent autonomously identifies recurring effective multi-step tool sequences, synthesizes them into reusable composite tools, and registers these as new high-level primitives that continuously expand its behavioral repertoire. A lightweight image-feature memory grounds tool selection in a visual-clinical context, while a GRPO-like training loop reinforces reliable invocation of discovered composites, enabling closed-loop self-improvement with minimal supervision. Extensive experiments across diverse medical imaging datasets and tasks demonstrate that autonomous composite tool discovery consistently improves multi-step orchestration accuracy and cross-domain generalization over strong baselines and recent state-of-the-art agentic methods, bridging the gap between brittle static tool use and adaptive, context-aware clinical AI assistance. Code will be available upon acceptance.
Paper Structure (30 sections, 9 equations, 4 figures, 3 tables)

This paper contains 30 sections, 9 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Existing medical agents vs. our MACRO. Existing medical agents rely on static, predefined tool sets and invocation strategies that struggle to adapt to variations across different imaging domains or intra-class sample differences, often leading to failures. In contrast, MACRO enhances adaptability by discovering and integrating composite tools, validated tool sequences capable of executing multi-step operations. These sequences are distilled from repeatedly successful trajectories in real-world workflows, enabling more robust performance in the face of clinical variability.
  • Figure 2: Pipeline of proposed MACRO. For each input image and multi-step trajectory, MACRO first retrieves relevant experiences from a memory store using image feature similarity. At every step, the model generates a response; if the generated tool sequence contains any registered composite tool $\mathbf{c} \in \mathcal{C}$, a positive reward is assigned. Tool calls are executed, and their results are appended to the evidence, informing subsequent steps. At the end of the trajectory, if the final answer matches the ground truth, the entire tool sequence is stored back into $\mathcal{M}$, and its contiguous subsequences increment the frequency counts in the composite registry $\mathcal{C}$, enabling online discovery of new composite tools.
  • Figure 3: Closed-loop learning in MACRO yields two measurable benefits: (a) tool complexity decreases as higher-level tools progressively replace multi-step patterns; (b) As MACRO acquires stronger tools and better demonstrations, performance with abstract tools improves by 8.8% over the basic tool lib.
  • Figure 4: A case study on glaucoma diagnosis, illustrating the detailed workflow within the MACRO framework. The green text represents the call action operations, while the surrounding reasoning reflects the model's explanation and thought process behind each action.