Table of Contents
Fetching ...

The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It

Aaron Nicolson, Shengyao Zhuang, Jason Dowling, Bevan Koopman

TL;DR

The paper addresses how integrating auxiliary patient data from emergency department records with chest X-ray imaging can improve automated CXR report generation. It proposes a novel embedding strategy that converts heterogeneous data (numerical, categorical, textual, temporal, and image) into prompts for a multimodal LLM, using time-delta, source, and position embeddings. Through linking MIMIC-CXR with MIMIC-IV-ED and training on three stages including reinforcement learning, the method demonstrates statistically significant gains in radiology-report metrics and CheXpert-label performance, highlighting the value of broader clinical context for diagnostic accuracy. The findings suggest practical benefits for radiology workflows and patient outcomes, while outlining limitations and directions for future work, including scalability, interpretability, and clinical outcome evaluation.

Abstract

This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation. Traditionally, CXR report generation relies solely on CXR images and limited radiology data, overlooking valuable information from patient health records, particularly from emergency departments. Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we incorporate detailed patient information such as vital signs, medicines, and clinical history to enhance diagnostic accuracy. We introduce a novel approach to transform these heterogeneous data sources into embeddings that prompt a multimodal language model; this significantly enhances the diagnostic accuracy of generated radiology reports. Our comprehensive evaluation demonstrates the benefits of using a broader set of patient data, underscoring the potential for enhanced diagnostic capabilities and better patient outcomes through the integration of multimodal data in CXR report generation.

The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It

TL;DR

The paper addresses how integrating auxiliary patient data from emergency department records with chest X-ray imaging can improve automated CXR report generation. It proposes a novel embedding strategy that converts heterogeneous data (numerical, categorical, textual, temporal, and image) into prompts for a multimodal LLM, using time-delta, source, and position embeddings. Through linking MIMIC-CXR with MIMIC-IV-ED and training on three stages including reinforcement learning, the method demonstrates statistically significant gains in radiology-report metrics and CheXpert-label performance, highlighting the value of broader clinical context for diagnostic accuracy. The findings suggest practical benefits for radiology workflows and patient outcomes, while outlining limitations and directions for future work, including scalability, interpretability, and clinical outcome evaluation.

Abstract

This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation. Traditionally, CXR report generation relies solely on CXR images and limited radiology data, overlooking valuable information from patient health records, particularly from emergency departments. Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we incorporate detailed patient information such as vital signs, medicines, and clinical history to enhance diagnostic accuracy. We introduce a novel approach to transform these heterogeneous data sources into embeddings that prompt a multimodal language model; this significantly enhances the diagnostic accuracy of generated radiology reports. Our comprehensive evaluation demonstrates the benefits of using a broader set of patient data, underscoring the potential for enhanced diagnostic capabilities and better patient outcomes through the integration of multimodal data in CXR report generation.
Paper Structure (33 sections, 3 equations, 7 figures, 14 tables)

This paper contains 33 sections, 3 equations, 7 figures, 14 tables.

Figures (7)

  • Figure 1: The patient data from MIMIC-IV-ED associated with a CXR exam from MIMIC-CXR. This includes the exam's images, the corresponding radiology report, and the associated image metadata. The findings and impression sections of the radiology report form the ground truth for CXR report generation. Emergency-specific data, such as reconciled medicines and aperiodic vital signs, are also available for the patient.
  • Figure 2: Multimodal language model for CXR report generation. The patient data embeddings prompt the decoder to generate the findings and impression sections of a radiology report.
  • Figure 3: Proposed patient data embeddings from the multiple heterogeneous data types taken from MIMIC-IV-ED and MIMIC-CXR. The embeddings are formed from numerical, categorical, textual, temporal, and image data.
  • Figure 4: Case study demonstrating how incorporating auxiliary patient data can aid with report generation.
  • Figure D.1: Attention mask for the decoder. Non-causal masking was used for the patient data embeddings and causal masking for the report token embeddings.
  • ...and 2 more figures