Table of Contents
Fetching ...

HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation

Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, Baocai Yin

TL;DR

The paper tackles radiology report generation by incorporating historical data to capture disease progression, addressing the limitations of single-image RRG. It introduces HC-LLM, which extracts time-shared and time-specific features from longitudinal chest X-rays and reports and enforces tri-consistency across intra- and inter-modality spaces using losses $\mathcal{L}_{sim}$, $\mathcal{L}_{con}$, and $\mathcal{L}_{stru}$ within the overall objective $\mathcal{L}_{total} = \mathcal{L}_{RRG} + \beta_1( \mathcal{L}_{sim}^{img} + \mathcal{L}_{sim}^{txt}) + \beta_2 \mathcal{L}_{con} + \beta_3 \mathcal{L}_{stru}$. The framework leverages a Swin Transformer visual encoder, an LLM-based text generator, and a configurable prompt $p_g$ to produce reports that align with disease evolution, achieving state-of-the-art results on Longitudinal-MIMIC and demonstrating robustness without historical data during testing. Extensive experiments, ablations, and cross-LLM analyses validate the effectiveness of the tri-consistency constraints in guiding longitudinal report generation. The work offers a practical paradigm for adapting general LLMs to sequential medical data and highlights future potential for incorporating more historical time points and broader multimodal inputs.

Abstract

Radiology report generation (RRG) models typically focus on individual exams, often overlooking the integration of historical visual or textual data, which is crucial for patient follow-ups. Traditional methods usually struggle with long sequence dependencies when incorporating historical information, but large language models (LLMs) excel at in-context learning, making them well-suited for analyzing longitudinal medical data. In light of this, we propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for RRG, empowering LLMs with longitudinal report generation capabilities by constraining the consistency and differences between longitudinal images and their corresponding reports. Specifically, our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression. Then, we ensure consistent representation by applying intra-modality similarity constraints and aligning various features across modalities with multimodal contrastive and structural constraints. These combined constraints effectively guide the LLMs in generating diagnostic reports that accurately reflect the progression of the disease, achieving state-of-the-art results on the Longitudinal-MIMIC dataset. Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models, enhancing its versatility.

HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation

TL;DR

The paper tackles radiology report generation by incorporating historical data to capture disease progression, addressing the limitations of single-image RRG. It introduces HC-LLM, which extracts time-shared and time-specific features from longitudinal chest X-rays and reports and enforces tri-consistency across intra- and inter-modality spaces using losses , , and within the overall objective . The framework leverages a Swin Transformer visual encoder, an LLM-based text generator, and a configurable prompt to produce reports that align with disease evolution, achieving state-of-the-art results on Longitudinal-MIMIC and demonstrating robustness without historical data during testing. Extensive experiments, ablations, and cross-LLM analyses validate the effectiveness of the tri-consistency constraints in guiding longitudinal report generation. The work offers a practical paradigm for adapting general LLMs to sequential medical data and highlights future potential for incorporating more historical time points and broader multimodal inputs.

Abstract

Radiology report generation (RRG) models typically focus on individual exams, often overlooking the integration of historical visual or textual data, which is crucial for patient follow-ups. Traditional methods usually struggle with long sequence dependencies when incorporating historical information, but large language models (LLMs) excel at in-context learning, making them well-suited for analyzing longitudinal medical data. In light of this, we propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for RRG, empowering LLMs with longitudinal report generation capabilities by constraining the consistency and differences between longitudinal images and their corresponding reports. Specifically, our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression. Then, we ensure consistent representation by applying intra-modality similarity constraints and aligning various features across modalities with multimodal contrastive and structural constraints. These combined constraints effectively guide the LLMs in generating diagnostic reports that accurately reflect the progression of the disease, achieving state-of-the-art results on the Longitudinal-MIMIC dataset. Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models, enhancing its versatility.

Paper Structure

This paper contains 19 sections, 20 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Illustration of the longitudinal report generation process. Unlike traditional methods that rely solely on the current chest X-ray, our approach emphasizes the effective utilization of historical diagnostic information to significantly enhance the accuracy of LLMs in RRG.
  • Figure 2: Overview of the proposed framework: First, the current chest X-ray is processed to generate a diagnostic report using a visual encoder and LLM. The framework then extracts time-shared and time-specific features from the current and prior chest X-rays, along with the generated and prior diagnostic reports. Then, similarity constraints are first applied to ensure consistent time-shared representation over time. Finally, multimodal contrastive and structural constraints are employed to align shared and specific features across modalities, ensuring the generated report accurately reflects disease progression.
  • Figure 3: An illustration of reports generated by different models using longitudinal images and the historical report. Brown denotes common content, while purple and blue indicate time-specific content. Underlined text marks incorrect predictions.
  • Figure 4: Visualization of feature distributions using t-SNE for the R2GenGPT and HC-LLM (Ours) models.
  • Figure 5: Performance comparison of BLEU-4 and ROUGE-L scores for R2GenGPT and HC-LLM(Ours) models across different LLMs.
  • ...and 3 more figures