HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation
Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, Baocai Yin
TL;DR
The paper tackles radiology report generation by incorporating historical data to capture disease progression, addressing the limitations of single-image RRG. It introduces HC-LLM, which extracts time-shared and time-specific features from longitudinal chest X-rays and reports and enforces tri-consistency across intra- and inter-modality spaces using losses $\mathcal{L}_{sim}$, $\mathcal{L}_{con}$, and $\mathcal{L}_{stru}$ within the overall objective $\mathcal{L}_{total} = \mathcal{L}_{RRG} + \beta_1( \mathcal{L}_{sim}^{img} + \mathcal{L}_{sim}^{txt}) + \beta_2 \mathcal{L}_{con} + \beta_3 \mathcal{L}_{stru}$. The framework leverages a Swin Transformer visual encoder, an LLM-based text generator, and a configurable prompt $p_g$ to produce reports that align with disease evolution, achieving state-of-the-art results on Longitudinal-MIMIC and demonstrating robustness without historical data during testing. Extensive experiments, ablations, and cross-LLM analyses validate the effectiveness of the tri-consistency constraints in guiding longitudinal report generation. The work offers a practical paradigm for adapting general LLMs to sequential medical data and highlights future potential for incorporating more historical time points and broader multimodal inputs.
Abstract
Radiology report generation (RRG) models typically focus on individual exams, often overlooking the integration of historical visual or textual data, which is crucial for patient follow-ups. Traditional methods usually struggle with long sequence dependencies when incorporating historical information, but large language models (LLMs) excel at in-context learning, making them well-suited for analyzing longitudinal medical data. In light of this, we propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for RRG, empowering LLMs with longitudinal report generation capabilities by constraining the consistency and differences between longitudinal images and their corresponding reports. Specifically, our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression. Then, we ensure consistent representation by applying intra-modality similarity constraints and aligning various features across modalities with multimodal contrastive and structural constraints. These combined constraints effectively guide the LLMs in generating diagnostic reports that accurately reflect the progression of the disease, achieving state-of-the-art results on the Longitudinal-MIMIC dataset. Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models, enhancing its versatility.
