PriorRG: Prior-Guided Contrastive Pre-training and Coarse-to-Fine Decoding for Chest X-ray Report Generation
Kang Liu, Zhuoqi Ma, Zikang Fang, Yunan Li, Kun Xie, Qiguang Miao
TL;DR
PriorRG addresses the lack of patient-specific priors in chest X-ray report generation by jointly modeling clinical context and longitudinal image information. The two-stage pipeline first pre-trains with prior-guided contrastive learning to align spatiotemporal visuals with clinical context, then decodes reports with a prior-aware coarse-to-fine strategy that fuses clinical priors, spatiotemporal cues, and hierarchical visual features. Experimental results on MIMIC-CXR and MIMIC-ABN demonstrate consistent improvements over SOTA in both NLG and clinical-accuracy metrics, including BLEU-4, BLEU-1, and F1 scores, as well as retrieval measures. The approach offers practical gains for clinical workflow by producing more accurate, fluent, and progression-aware radiology reports, with code to be released.
Abstract
Chest X-ray report generation aims to reduce radiologists' workload by automatically producing high-quality preliminary reports. A critical yet underexplored aspect of this task is the effective use of patient-specific prior knowledge -- including clinical context (e.g., symptoms, medical history) and the most recent prior image -- which radiologists routinely rely on for diagnostic reasoning. Most existing methods generate reports from single images, neglecting this essential prior information and thus failing to capture diagnostic intent or disease progression. To bridge this gap, we propose PriorRG, a novel chest X-ray report generation framework that emulates real-world clinical workflows via a two-stage training pipeline. In Stage 1, we introduce a prior-guided contrastive pre-training scheme that leverages clinical context to guide spatiotemporal feature extraction, allowing the model to align more closely with the intrinsic spatiotemporal semantics in radiology reports. In Stage 2, we present a prior-aware coarse-to-fine decoding for report generation that progressively integrates patient-specific prior knowledge with the vision encoder's hidden states. This decoding allows the model to align with diagnostic focus and track disease progression, thereby enhancing the clinical accuracy and fluency of the generated reports. Extensive experiments on MIMIC-CXR and MIMIC-ABN datasets demonstrate that PriorRG outperforms state-of-the-art methods, achieving a 3.6% BLEU-4 and 3.8% F1 score improvement on MIMIC-CXR, and a 5.9% BLEU-1 gain on MIMIC-ABN. Code and checkpoints will be released upon acceptance.
