Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation: Radiologist-Like Workflow with Clinically Verifiable Rewards

Kaito Baba; Satoshi Kodera

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation: Radiologist-Like Workflow with Clinically Verifiable Rewards

Kaito Baba, Satoshi Kodera

Abstract

We propose MARL-Rad, a novel multi-modal multi-agent reinforcement learning framework for radiology report generation that coordinates region-specific agents and a global integrating agent, optimized via clinically verifiable rewards. Unlike prior single-model reinforcement learning or post-hoc agentization of independently trained models, our method jointly trains multiple agents and optimizes the entire agent system through reinforcement learning. Experiments on the MIMIC-CXR and IU X-ray datasets show that MARL-Rad consistently improves clinically efficacy (CE) metrics such as RadGraph, CheXbert, and GREEN scores, achieving state-of-the-art CE performance. Further analyses confirm that MARL-Rad enhances laterality consistency and produces more accurate, detail-informed reports.

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation: Radiologist-Like Workflow with Clinically Verifiable Rewards

Abstract

Paper Structure (27 sections, 6 equations, 3 figures, 4 tables)

This paper contains 27 sections, 6 equations, 3 figures, 4 tables.

Introduction
Related work
Reinforcement learning for LLMs
Agentic systems with LLMs
LLMs for radiology report generation (RRG)
Reinforcement learning for RRG
Agentic systems for RRG
Method
Preliminaries
Multi-agent extension of GSPO
Multi-agent reinforcement learning on RRG
Experiments
Experimental setup
Datasets
Evaluation metrics
...and 12 more sections

Figures (3)

Figure 1: Comparison with previous state-of-the-art (SOTA) methods on the MIMIC-CXR mimic-cxr and IU X-ray iu-xray datasets. Our method achieves SOTA performance in clinical efficacy metrics, showing the highest scores in both RadGraph F1 and CheXbert F1.
Figure 2: Overview of the proposed multi-agent RL framework. Region-specific agents and global integrating agent collaboratively generate the radiology report, and the entire agent system is jointly optimized through RL based on clinically verifiable rewards.
Figure 3: Example output from MARL-Rad. Region-specific agents consistently focus on their assigned regions and generate regional diagnosis. The global integrating agent synthesizes these drafts into a concise and coherent report while adding relevant global findings. The resulting final report is detailed and aligns well with the ground truth, correctly concluding with "No acute cardiopulmonary process."

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation: Radiologist-Like Workflow with Clinically Verifiable Rewards

Abstract

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation: Radiologist-Like Workflow with Clinically Verifiable Rewards

Authors

Abstract

Table of Contents

Figures (3)