Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

Aaron Nicolson; Jason Dowling; Bevan Koopman

Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

Aaron Nicolson, Jason Dowling, Bevan Koopman

TL;DR

This work tackles automated chest X-ray (CXR) report generation by integrating realistic radiologist workflow through longitudinal, multi-image conditioning and a novel reinforcement-learning reward based on CXR-BERT semantic similarity. The proposed CXRMate model conditions on all images of the current study and, when available, the previous study's report, while differentiating report sections with embeddings and using LoRA to efficiently incorporate prior prompts. A CXR-BERT cosine-similarity reward guides SCST, yielding superior semantic alignment with radiologist reports compared to RadGraph-based rewards, and the method demonstrates strong performance on MIMIC-CXR and Open-i IU X-ray datasets, with generalisability shown on Open-i IU X-ray. The study also highlights evaluation and labeling issues in the literature and provides open-source code and checkpoints to promote reproducibility and further development in clinical CXR report generation.

Abstract

Radiologists face high burnout rates, partially due to the increasing volume of Chest X-rays (CXRs) requiring interpretation and reporting. Automated CXR report generation holds promise for reducing this burden and improving patient care. While current models show potential, their diagnostic accuracy is limited. Our proposed CXR report generator integrates elements of the radiologist workflow and introduces a novel reward for reinforcement learning. Our approach leverages longitudinal data from a patient's prior CXR study and effectively handles cases where no prior study exist, thus mirroring the radiologist's workflow. In contrast, existing models typically lack this flexibility, often requiring prior studies for the model to function optimally. Our approach also incorporates all CXRs from a patient's study and distinguishes between report sections through section embeddings. Our reward for reinforcement learning leverages CXR-BERT, which forces our model to learn the clinical semantics of radiology reporting. We conduct experiments on publicly available datasets -- MIMIC-CXR and Open-i IU X-ray -- with metrics shown to more closely correlate with radiologists' assessment of reporting. Results from our study demonstrate that the proposed model generates reports that are more aligned with radiologists' reports than state-of-the-art models, such as those utilising large language models, reinforcement learning, and multi-task learning. The proposed model improves the diagnostic accuracy of CXR report generation, which could one day reduce radiologists' workload and enhance patient care. Our Hugging Face checkpoint (https://huggingface.co/aehrc/cxrmate) and code (https://github.com/aehrc/cxrmate) are publicly available.

Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

TL;DR

Abstract

Paper Structure (24 sections, 8 figures, 12 tables)

This paper contains 24 sections, 8 figures, 12 tables.

Introduction
Background
Datasets
Related Work
Methods
Longitudinal, Multi-image CXR Report Generation
CXR-BERT Cosine Similarity Reward
SCST With the Generated Report From the Previous Study as the Prompt
Section Embeddings and Issues With Labels in the Literature
Experiment Setup
Dataset splitting and formatting
Model
Training
Comparison Models and Rewards
Metrics
...and 9 more sections

Figures (8)

Figure 1: A patient can have multiple CXR studies over time. Each study can consist of multiple images, often representing different views of the chest. Note that the year of each study has been modified for anonymisation purposes.
Figure 2: CXR report generation conditioned on A: a single image of a study, B: all images of a study, and C: all images of a study, as well as the report of the previous study.
Figure 3: Histograms of the training split of MIMIC-CXR johnson_mimic-cxr-jpg_2019. Top: multiple images are often taken for a single CXR study, thus motivating multi-image CXR report generation. Bottom: a patient often has multiple CXR studies over time, thus motivating our longitudinal, multi-image CXR report generation approach.
Figure 4: Our proposed model: a longitudinal, multi-image CXR report generator trained with reinforcement learning using the CXR-BERT cosine similarity reward. The findings and impression sections from the reports of the current and previous studies are differentiated by section embeddings and separator tokens. The prompt is the report of the previous study. The model is still able to generate a diagnostically accurate report even when the previous report is not available.
Figure 5: Results for the different conditioning strategies of Figure \ref{['fig:tasks']}. The error bars indicate the mean and standard deviation over three training runs. Dotted lines indicate a significant difference between the scores of two methods ($p<0.05$, $n=4\,872; ~1\,624~{\rm studies} \times 3~{\rm runs}$). $\pmb w^{t-1}$ indicates the radiologist report as the prompt, while $\hat{\pmb w}^{t-1}$ indicates the generated report as the prompt.
...and 3 more figures

Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

TL;DR

Abstract

Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)