Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation
Aaron Nicolson, Jason Dowling, Bevan Koopman
TL;DR
This work tackles automated chest X-ray (CXR) report generation by integrating realistic radiologist workflow through longitudinal, multi-image conditioning and a novel reinforcement-learning reward based on CXR-BERT semantic similarity. The proposed CXRMate model conditions on all images of the current study and, when available, the previous study's report, while differentiating report sections with embeddings and using LoRA to efficiently incorporate prior prompts. A CXR-BERT cosine-similarity reward guides SCST, yielding superior semantic alignment with radiologist reports compared to RadGraph-based rewards, and the method demonstrates strong performance on MIMIC-CXR and Open-i IU X-ray datasets, with generalisability shown on Open-i IU X-ray. The study also highlights evaluation and labeling issues in the literature and provides open-source code and checkpoints to promote reproducibility and further development in clinical CXR report generation.
Abstract
Radiologists face high burnout rates, partially due to the increasing volume of Chest X-rays (CXRs) requiring interpretation and reporting. Automated CXR report generation holds promise for reducing this burden and improving patient care. While current models show potential, their diagnostic accuracy is limited. Our proposed CXR report generator integrates elements of the radiologist workflow and introduces a novel reward for reinforcement learning. Our approach leverages longitudinal data from a patient's prior CXR study and effectively handles cases where no prior study exist, thus mirroring the radiologist's workflow. In contrast, existing models typically lack this flexibility, often requiring prior studies for the model to function optimally. Our approach also incorporates all CXRs from a patient's study and distinguishes between report sections through section embeddings. Our reward for reinforcement learning leverages CXR-BERT, which forces our model to learn the clinical semantics of radiology reporting. We conduct experiments on publicly available datasets -- MIMIC-CXR and Open-i IU X-ray -- with metrics shown to more closely correlate with radiologists' assessment of reporting. Results from our study demonstrate that the proposed model generates reports that are more aligned with radiologists' reports than state-of-the-art models, such as those utilising large language models, reinforcement learning, and multi-task learning. The proposed model improves the diagnostic accuracy of CXR report generation, which could one day reduce radiologists' workload and enhance patient care. Our Hugging Face checkpoint (https://huggingface.co/aehrc/cxrmate) and code (https://github.com/aehrc/cxrmate) are publicly available.
