Table of Contents
Fetching ...

HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis

Haoxu Huang, Cem M. Deniz, Kyunghyun Cho, Sumit Chopra, Divyam Madaan

TL;DR

This work introduces Temporal MIMIC, a longitudinal chest X-ray dataset pairing five years of radiographs with clinical reports, enabling HIST-AID to leverage historical data for abnormality detection. HIST-AID uses pre-trained ViT and BERT encoders, time-series transformers, and a multi-modal fusion module (notably ViLT) to fuse past images and reports with current data, capturing temporal dynamics via Rotary Positional Encoding. Across 13 pathologies, the framework achieves substantial improvements in AUROC ($6.56\%$) and AUPRC ($9.51\%$) over current-scan baselines, with gains persisting across gender, age, and race groups; however, very distant histories can hurt performance. The findings highlight the clinical value of incorporating recent historical information (up to $30$ days) while emphasizing careful selection of historical data for robust, equitable automatic diagnosis and decision support.

Abstract

Chest X-ray imaging is a widely accessible and non-invasive diagnostic tool for detecting thoracic abnormalities. While numerous AI models assist radiologists in interpreting these images, most overlook patients' historical data. To bridge this gap, we introduce Temporal MIMIC dataset, which integrates five years of patient history, including radiographic scans and reports from MIMIC-CXR and MIMIC-IV, encompassing 12,221 patients and thirteen pathologies. Building on this, we present HIST-AID, a framework that enhances automatic diagnostic accuracy using historical reports. HIST-AID emulates the radiologist's comprehensive approach, leveraging historical data to improve diagnostic accuracy. Our experiments demonstrate significant improvements, with AUROC increasing by 6.56% and AUPRC by 9.51% compared to models that rely solely on radiographic scans. These gains were consistently observed across diverse demographic groups, including variations in gender, age, and racial categories. We show that while recent data boost performance, older data may reduce accuracy due to changes in patient conditions. Our work paves the potential of incorporating historical data for more reliable automatic diagnosis, providing critical support for clinical decision-making.

HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis

TL;DR

This work introduces Temporal MIMIC, a longitudinal chest X-ray dataset pairing five years of radiographs with clinical reports, enabling HIST-AID to leverage historical data for abnormality detection. HIST-AID uses pre-trained ViT and BERT encoders, time-series transformers, and a multi-modal fusion module (notably ViLT) to fuse past images and reports with current data, capturing temporal dynamics via Rotary Positional Encoding. Across 13 pathologies, the framework achieves substantial improvements in AUROC () and AUPRC () over current-scan baselines, with gains persisting across gender, age, and race groups; however, very distant histories can hurt performance. The findings highlight the clinical value of incorporating recent historical information (up to days) while emphasizing careful selection of historical data for robust, equitable automatic diagnosis and decision support.

Abstract

Chest X-ray imaging is a widely accessible and non-invasive diagnostic tool for detecting thoracic abnormalities. While numerous AI models assist radiologists in interpreting these images, most overlook patients' historical data. To bridge this gap, we introduce Temporal MIMIC dataset, which integrates five years of patient history, including radiographic scans and reports from MIMIC-CXR and MIMIC-IV, encompassing 12,221 patients and thirteen pathologies. Building on this, we present HIST-AID, a framework that enhances automatic diagnostic accuracy using historical reports. HIST-AID emulates the radiologist's comprehensive approach, leveraging historical data to improve diagnostic accuracy. Our experiments demonstrate significant improvements, with AUROC increasing by 6.56% and AUPRC by 9.51% compared to models that rely solely on radiographic scans. These gains were consistently observed across diverse demographic groups, including variations in gender, age, and racial categories. We show that while recent data boost performance, older data may reduce accuracy due to changes in patient conditions. Our work paves the potential of incorporating historical data for more reliable automatic diagnosis, providing critical support for clinical decision-making.

Paper Structure

This paper contains 33 sections, 2 equations, 19 figures, 3 tables.

Figures (19)

  • Figure 1: Temporal MIMIC Dataset: The dataset consists of radiographic scans and corresponding radiology reports collected over a span of five years, providing a comprehensive view of the progression of patient conditions over time. The final report, highlighted in red, is used to obtain the ground-truth labels for the patient's current condition.
  • Figure 3: AUROC comparison between different models for 13 pathologies. We compare the image-only model in blue (left) bars that utilizes images from the current timestamp with HIST-AID in red (right), that integrates both current images and past textual data for diagnosis. We show that our model using both current scan and historical reports text enhances AUROC across all pathologies.
  • Figure 4: AUROC comparison between model trained with current scan and HIST-AID across different demographic groups. Our model in red (right) consistently outperforms the model in blue (left) trained with current scan images across gender, age groups, and racial categories. The error bars represent standard deviations calculated over five independent runs.
  • Figure 5: Impact of increasing the number of reports on AUROC performance: The performance of the temporal multi-modal model enhances as the number of reports increases, surpassing the model that relies solely on current timestamp images.
  • Figure 6: Impact of report timing relative to final diagnosis on AUROC performance: The performance of the temporal multi-modal model improves when utilizing reports from up to the last 30 days, but it declines with reports from more distant periods, cautioning against the use of older data.
  • ...and 14 more figures