Table of Contents
Fetching ...

Radiology Workflow-Guided Hierarchical Reinforcement Fine-Tuning for Medical Report Generation

Bodong Du, Honglong Yang, Xiaomeng Li

TL;DR

RadFlow tackles the problem of misalignment between descriptive findings and diagnostic impressions in medical report generation by introducing a hierarchical reinforcement-fine-tuning framework that mirrors radiologists' workflow. It decomposes rewards into a global component for fluent, clinically faithful Findings and cross-sectional consistency and a local component for Impression accuracy, augmented by Target Exploration and a critical-aware policy optimization (CAPO) that tightens updates for high-stakes cases. The approach is backed by theoretical guarantees on policy stability and demonstrated through extensive experiments on carotid ultrasound and chest X-ray datasets, where RadFlow achieves superior diagnostic coherence and overall report quality over state-of-the-art baselines. The work highlights a promising direction for incorporating structured clinical reasoning into end-to-end learning, with potential extensions to more modalities and human-in-the-loop feedback to further improve reliability and safety in clinical reporting.

Abstract

Radiologists compose diagnostic reports through a structured workflow: they describe visual findings, summarize them into impressions, and carefully refine statements in clinically critical cases. However, most existing medical report generation (MRG) systems treat reports as flat sequences, overlooking this hierarchical organization and leading to inconsistencies between descriptive and diagnostic content. To align model behavior with real-world reporting practices, we propose RadFlow, a hierarchical workflow-guided reinforcement optimization framework that explicitly models the structured nature of clinical reporting. RadFlow introduces a clinically grounded reward hierarchy that mirrors the organization of radiological reports. At the global level, the reward integrates linguistic fluency, medical-domain correctness, and cross-sectional consistency between Finding and Impression, promoting coherent and clinically faithful narratives. At the local level, a section-specific reward emphasizes Impression quality, reflecting its central role in diagnostic accuracy. Furthermore, a critical-aware policy optimization mechanism adaptively regularizes learning for high-risk or clinically sensitive cases, emulating the cautious refinement behavior of radiologists when documenting critical findings. Together, these components translate the structured reporting paradigm into the reinforcement fine-tuning process, enabling the model to generate reports that are both linguistically consistent and clinically aligned. Experiments on chest X-ray and carotid ultrasound datasets demonstrate that RadFlow consistently improves diagnostic coherence and overall report quality compared with state-of-the-art baselines.

Radiology Workflow-Guided Hierarchical Reinforcement Fine-Tuning for Medical Report Generation

TL;DR

RadFlow tackles the problem of misalignment between descriptive findings and diagnostic impressions in medical report generation by introducing a hierarchical reinforcement-fine-tuning framework that mirrors radiologists' workflow. It decomposes rewards into a global component for fluent, clinically faithful Findings and cross-sectional consistency and a local component for Impression accuracy, augmented by Target Exploration and a critical-aware policy optimization (CAPO) that tightens updates for high-stakes cases. The approach is backed by theoretical guarantees on policy stability and demonstrated through extensive experiments on carotid ultrasound and chest X-ray datasets, where RadFlow achieves superior diagnostic coherence and overall report quality over state-of-the-art baselines. The work highlights a promising direction for incorporating structured clinical reasoning into end-to-end learning, with potential extensions to more modalities and human-in-the-loop feedback to further improve reliability and safety in clinical reporting.

Abstract

Radiologists compose diagnostic reports through a structured workflow: they describe visual findings, summarize them into impressions, and carefully refine statements in clinically critical cases. However, most existing medical report generation (MRG) systems treat reports as flat sequences, overlooking this hierarchical organization and leading to inconsistencies between descriptive and diagnostic content. To align model behavior with real-world reporting practices, we propose RadFlow, a hierarchical workflow-guided reinforcement optimization framework that explicitly models the structured nature of clinical reporting. RadFlow introduces a clinically grounded reward hierarchy that mirrors the organization of radiological reports. At the global level, the reward integrates linguistic fluency, medical-domain correctness, and cross-sectional consistency between Finding and Impression, promoting coherent and clinically faithful narratives. At the local level, a section-specific reward emphasizes Impression quality, reflecting its central role in diagnostic accuracy. Furthermore, a critical-aware policy optimization mechanism adaptively regularizes learning for high-risk or clinically sensitive cases, emulating the cautious refinement behavior of radiologists when documenting critical findings. Together, these components translate the structured reporting paradigm into the reinforcement fine-tuning process, enabling the model to generate reports that are both linguistically consistent and clinically aligned. Experiments on chest X-ray and carotid ultrasound datasets demonstrate that RadFlow consistently improves diagnostic coherence and overall report quality compared with state-of-the-art baselines.

Paper Structure

This paper contains 20 sections, 14 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Conceptual illustration of RadFlow. Traditional models treat reports as flat token sequences, ignoring the structured logic between Findings and Impression. RadFlow introduces hierarchical rewards, where the global reward enforces descriptive and cross-sectional consistency, and the local reward enhances diagnostic reasoning.
  • Figure 2: Overview of the RadFlow framework, which translates radiologists’ structured reporting workflow into reinforcement fine-tuning. Given an image and prompt, the Policy Model generates candidate reports (e.g., $y_i$, $y_j$). A hierarchical reward guides optimization: (1) Global Report Reward jointly evaluates linguistic fluency, domain correctness, and cross-sectional consistency between Finding and Impression; (2) Local Impression Reward prioritizes diagnostic accuracy in the critical Impression section via an expert model. The Critical-Aware Policy Optimization module adaptively adjusts learning for high-risk cases (e.g., $\epsilon_i \downarrow \rightarrow$ cautious$\uparrow$), mimicking radiologists’ careful refinement under clinical uncertainty.
  • Figure 3: Comparison of generated medical reports under different training strategies. The figure highlights differences in linguistic fluency, diagnostic accuracy, and sectional consistency between MedVersa trained on CarotidUS-MRG and our proposed RadFlow method, demonstrating RadFlow’s superior diagnostic coherence and cross-sectional consistency.