MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning

Peng Xia; Jinglu Wang; Yibo Peng; Kaide Zeng; Xian Wu; Xiangru Tang; Hongtu Zhu; Yun Li; Shujie Liu; Yan Lu; Huaxiu Yao

MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning

Peng Xia, Jinglu Wang, Yibo Peng, Kaide Zeng, Xian Wu, Xiangru Tang, Hongtu Zhu, Yun Li, Shujie Liu, Yan Lu, Huaxiu Yao

TL;DR

MMedAgent-RL tackles the generalization gap of single Med-LVLMs by enabling dynamic, reinforcement-learning–driven collaboration among GP and specialist agents in a clinically inspired GP→Specialists→GP loop. It introduces a curriculum-based MARL (C-MARL) framework that first trains a triage GP, then uses specialist outputs, and finally trains an attending physician to balance imitation and correction of expert judgments via GRPO. Across five medical VQA benchmarks, it achieves state-of-the-art performance, with an average 20.7% gain over supervised fine-tuning baselines, and demonstrates human-like, stepwise reasoning patterns. The work shows strong in-domain and out-of-domain generalization and points to a scalable path for robust multimodal medical reasoning.

Abstract

Medical Large Vision-Language Models (Med-LVLMs) have shown strong potential in multimodal diagnostic tasks. However, existing single-agent models struggle to generalize across diverse medical specialties, limiting their performance. Recent efforts introduce multi-agent collaboration frameworks inspired by clinical workflows, where general practitioners (GPs) and specialists interact in a fixed sequence. Despite improvements, these static pipelines lack flexibility and adaptability in reasoning. To address this, we propose MMedAgent-RL, a reinforcement learning (RL)-based multi-agent framework that enables dynamic, optimized collaboration among medical agents. Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists and its own knowledge to make final decisions. To address the inconsistency in specialist outputs, we introduce a curriculum learning (CL)-guided RL strategy that progressively teaches the attending physician to balance between imitating specialists and correcting their mistakes. Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL not only outperforms both open-source and proprietary Med-LVLMs, but also exhibits human-like reasoning patterns. Notably, it achieves an average performance gain of 20.7% over supervised fine-tuning baselines.

MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning

TL;DR

Abstract

MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)