MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation
Pengyu Wang, Shuchang Ye, Usman Naseem, Jinman Kim
TL;DR
This work tackles the misalignment between token-level training and clinical correctness in medical report generation by introducing semantic-driven reinforcement learning (SRL) on a medical LVLM. It combines Group Relative Policy Optimization (GRPO) with a report-level Margin CheXbert Cosine Similarity (MCCS) reward and a lightweight <think>→<report> format constraint to guide long-form radiology reports toward clinically accurate, auditable content. Across IU X-Ray and MIMIC-CXR, the approach achieves state-of-the-art clinical efficacy (CE) scores and demonstrates the superiority of semantic, report-level supervision over traditional token-level objectives. The results highlight the potential of SRL to advance clinically reliable medical report generation and point to future work integrating richer semantic signals and multi-modal data for broader radiology applications.
Abstract
Medical report generation (MRG) aims to automatically derive radiology-style reports from medical images to aid in clinical decision-making. However, existing methods often generate text that mimics the linguistic style of radiologists but fails to guarantee clinical correctness, because they are trained on token-level objectives which focus on word-choice and sentence structure rather than actual medical accuracy. We propose a semantic-driven reinforcement learning (SRL) method for medical report generation, adopted on a large vision-language model (LVLM). SRL adopts Group Relative Policy Optimization (GRPO) to encourage clinical-correctness-guided learning beyond imitation of language style. Specifically, we optimise a report-level reward: a margin-based cosine similarity (MCCS) computed between key radiological findings extracted from generated and reference reports, thereby directly aligning clinical-label agreement and improving semantic correctness. A lightweight reasoning format constraint further guides the model to generate structured "thinking report" outputs. We evaluate Medical Report Generation with Sematic-driven Reinforment Learning (MRG-R1), on two datasets: IU X-Ray and MIMIC-CXR using clinical efficacy (CE) metrics. MRG-R1 achieves state-of-the-art performance with CE-F1 51.88 on IU X-Ray and 40.39 on MIMIC-CXR. We found that the label-semantic reinforcement is better than conventional token-level supervision. These results indicate that optimizing a clinically grounded, report-level reward rather than token overlap,meaningfully improves clinical correctness. This work is a prior to explore semantic-reinforcement in supervising medical correctness in medical Large vision-language model(Med-LVLM) training.
