Table of Contents
Fetching ...

MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation

Pengyu Wang, Shuchang Ye, Usman Naseem, Jinman Kim

TL;DR

This work tackles the misalignment between token-level training and clinical correctness in medical report generation by introducing semantic-driven reinforcement learning (SRL) on a medical LVLM. It combines Group Relative Policy Optimization (GRPO) with a report-level Margin CheXbert Cosine Similarity (MCCS) reward and a lightweight <think>→<report> format constraint to guide long-form radiology reports toward clinically accurate, auditable content. Across IU X-Ray and MIMIC-CXR, the approach achieves state-of-the-art clinical efficacy (CE) scores and demonstrates the superiority of semantic, report-level supervision over traditional token-level objectives. The results highlight the potential of SRL to advance clinically reliable medical report generation and point to future work integrating richer semantic signals and multi-modal data for broader radiology applications.

Abstract

Medical report generation (MRG) aims to automatically derive radiology-style reports from medical images to aid in clinical decision-making. However, existing methods often generate text that mimics the linguistic style of radiologists but fails to guarantee clinical correctness, because they are trained on token-level objectives which focus on word-choice and sentence structure rather than actual medical accuracy. We propose a semantic-driven reinforcement learning (SRL) method for medical report generation, adopted on a large vision-language model (LVLM). SRL adopts Group Relative Policy Optimization (GRPO) to encourage clinical-correctness-guided learning beyond imitation of language style. Specifically, we optimise a report-level reward: a margin-based cosine similarity (MCCS) computed between key radiological findings extracted from generated and reference reports, thereby directly aligning clinical-label agreement and improving semantic correctness. A lightweight reasoning format constraint further guides the model to generate structured "thinking report" outputs. We evaluate Medical Report Generation with Sematic-driven Reinforment Learning (MRG-R1), on two datasets: IU X-Ray and MIMIC-CXR using clinical efficacy (CE) metrics. MRG-R1 achieves state-of-the-art performance with CE-F1 51.88 on IU X-Ray and 40.39 on MIMIC-CXR. We found that the label-semantic reinforcement is better than conventional token-level supervision. These results indicate that optimizing a clinically grounded, report-level reward rather than token overlap,meaningfully improves clinical correctness. This work is a prior to explore semantic-reinforcement in supervising medical correctness in medical Large vision-language model(Med-LVLM) training.

MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation

TL;DR

This work tackles the misalignment between token-level training and clinical correctness in medical report generation by introducing semantic-driven reinforcement learning (SRL) on a medical LVLM. It combines Group Relative Policy Optimization (GRPO) with a report-level Margin CheXbert Cosine Similarity (MCCS) reward and a lightweight <think>→<report> format constraint to guide long-form radiology reports toward clinically accurate, auditable content. Across IU X-Ray and MIMIC-CXR, the approach achieves state-of-the-art clinical efficacy (CE) scores and demonstrates the superiority of semantic, report-level supervision over traditional token-level objectives. The results highlight the potential of SRL to advance clinically reliable medical report generation and point to future work integrating richer semantic signals and multi-modal data for broader radiology applications.

Abstract

Medical report generation (MRG) aims to automatically derive radiology-style reports from medical images to aid in clinical decision-making. However, existing methods often generate text that mimics the linguistic style of radiologists but fails to guarantee clinical correctness, because they are trained on token-level objectives which focus on word-choice and sentence structure rather than actual medical accuracy. We propose a semantic-driven reinforcement learning (SRL) method for medical report generation, adopted on a large vision-language model (LVLM). SRL adopts Group Relative Policy Optimization (GRPO) to encourage clinical-correctness-guided learning beyond imitation of language style. Specifically, we optimise a report-level reward: a margin-based cosine similarity (MCCS) computed between key radiological findings extracted from generated and reference reports, thereby directly aligning clinical-label agreement and improving semantic correctness. A lightweight reasoning format constraint further guides the model to generate structured "thinking report" outputs. We evaluate Medical Report Generation with Sematic-driven Reinforment Learning (MRG-R1), on two datasets: IU X-Ray and MIMIC-CXR using clinical efficacy (CE) metrics. MRG-R1 achieves state-of-the-art performance with CE-F1 51.88 on IU X-Ray and 40.39 on MIMIC-CXR. We found that the label-semantic reinforcement is better than conventional token-level supervision. These results indicate that optimizing a clinically grounded, report-level reward rather than token overlap,meaningfully improves clinical correctness. This work is a prior to explore semantic-reinforcement in supervising medical correctness in medical Large vision-language model(Med-LVLM) training.

Paper Structure

This paper contains 21 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of SRL. For each study, the policy samples a group of candidate reports; a margin CheXbert cosine reward (MCCS) and a lightweight format reward are combined to compute group-relative advantages for GRPO updates under a KL constraint to a reference policy.
  • Figure 2: An example case from IU X-ray (X-ray image) used for inference in MRG qualitative comparisons. The information in the ground truth report is labeled from 1 to 6 and highlighted separately. The generated reports are labeled according to the ground truth report and high lighted with different colors to represent the differences between the generated sequences and the ground truth report: (1)Green-consistent; (2)Red- incorrect information; (3)Unhighlighted-not included in the ground truth.
  • Figure 3: An example case from MIMIC-CXR (X-ray image) used for inference in MRG qualitative comparisons. The information in the ground truth report is labeled from 1 to 3 and highlighted separately. The generated reports are labeled according to the ground truth report and high lighted with different colors to represent the differences between the generated sequences and the ground truth report: (1)Green-consistent; (2)Red- incorrect information; (3)Unhighlighted-not included in the ground truth.