Table of Contents
Fetching ...

HERGen: Elevating Radiology Report Generation with Longitudinal Data

Fuying Wang, Shenghui Du, Lequan Yu

TL;DR

HERGen introduces a History Enhanced Radiology Report Generation framework that uses a group causal transformer to integrate longitudinal chest X-ray histories, plus a cross-modal contrastive objective and curriculum learning to stabilize training. It demonstrates state-of-the-art performance in radiology report generation and temporal disease progression classification on MIMIC-CXR, Longitudinal MIMIC-CXR, and MS-CXR-T, outperforming single-image baselines and previous longitudinal methods. The approach yields temporally coherent reports that reflect disease evolution and provides evidence of improved feature alignment via embedding visual-text representations. Code is released at https://github.com/fuying-wang/HERGen.

Abstract

Radiology reports provide detailed descriptions of medical imaging integrated with patients' medical histories, while report writing is traditionally labor-intensive, increasing radiologists' workload and the risk of diagnostic errors. Recent efforts in automating this process seek to mitigate these issues by enhancing accuracy and clinical efficiency. Emerging research in automating this process promises to alleviate these challenges by reducing errors and streamlining clinical workflows. However, existing automated approaches are based on a single timestamp and often neglect the critical temporal aspect of patients' imaging histories, which is essential for accurate longitudinal analysis. To address this gap, we propose a novel History Enhanced Radiology Report Generation (HERGen) framework that employs a employs a group causal transformer to efficiently integrate longitudinal data across patient visits. Our approach not only allows for comprehensive analysis of varied historical data but also improves the quality of generated reports through an auxiliary contrastive objective that aligns image sequences with their corresponding reports. More importantly, we introduce a curriculum learning-based strategy to adeptly handle the inherent complexity of longitudinal radiology data and thus stabilize the optimization of our framework. The extensive evaluations across three datasets demonstrate that our framework surpasses existing methods in generating accurate radiology reports and effectively predicting disease progression from medical images.

HERGen: Elevating Radiology Report Generation with Longitudinal Data

TL;DR

HERGen introduces a History Enhanced Radiology Report Generation framework that uses a group causal transformer to integrate longitudinal chest X-ray histories, plus a cross-modal contrastive objective and curriculum learning to stabilize training. It demonstrates state-of-the-art performance in radiology report generation and temporal disease progression classification on MIMIC-CXR, Longitudinal MIMIC-CXR, and MS-CXR-T, outperforming single-image baselines and previous longitudinal methods. The approach yields temporally coherent reports that reflect disease evolution and provides evidence of improved feature alignment via embedding visual-text representations. Code is released at https://github.com/fuying-wang/HERGen.

Abstract

Radiology reports provide detailed descriptions of medical imaging integrated with patients' medical histories, while report writing is traditionally labor-intensive, increasing radiologists' workload and the risk of diagnostic errors. Recent efforts in automating this process seek to mitigate these issues by enhancing accuracy and clinical efficiency. Emerging research in automating this process promises to alleviate these challenges by reducing errors and streamlining clinical workflows. However, existing automated approaches are based on a single timestamp and often neglect the critical temporal aspect of patients' imaging histories, which is essential for accurate longitudinal analysis. To address this gap, we propose a novel History Enhanced Radiology Report Generation (HERGen) framework that employs a employs a group causal transformer to efficiently integrate longitudinal data across patient visits. Our approach not only allows for comprehensive analysis of varied historical data but also improves the quality of generated reports through an auxiliary contrastive objective that aligns image sequences with their corresponding reports. More importantly, we introduce a curriculum learning-based strategy to adeptly handle the inherent complexity of longitudinal radiology data and thus stabilize the optimization of our framework. The extensive evaluations across three datasets demonstrate that our framework surpasses existing methods in generating accurate radiology reports and effectively predicting disease progression from medical images.
Paper Structure (14 sections, 5 equations, 6 figures, 5 tables)

This paper contains 14 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overview of our HERGen for radiology report generation: Our model processes longitudinal data for each patient and utilizes the comprehensive historical information within these longitudinal data to generate robust and precise radiology reports.
  • Figure 2: History Enhanced Radiology Report Generation (HERGen): the framework processes patient-level chest X-rays using the CvT* (CvT combined with the encoder projection layer), which then aggregates temporal information through a group causal transformer. Subsequently, GPT2 serves as the decoder for predicting the radiology report, which was optimized by a cross-entropy (CE) loss. Additionally, an auxiliary contrastive alignment module is employed to enhance the alignment of the latent spaces between image and text modalities, thereby producing more consistent reports. Note that in the group causal transformer block, thick lines represent image-level interactions, while thin lines indicate token-level interactions.
  • Figure 3: Comparison between (a) bidirectional attention, (b) causal attention and (c) our group causal attention.
  • Figure 4: Illustration of the proposed curriculum training strategy.
  • Figure 5: Embedding visualization of MIMIC-CXR images in CvT-212DistilGPT2 and our model with t-SNE.
  • ...and 1 more figures