HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis
Haoxu Huang, Cem M. Deniz, Kyunghyun Cho, Sumit Chopra, Divyam Madaan
TL;DR
This work introduces Temporal MIMIC, a longitudinal chest X-ray dataset pairing five years of radiographs with clinical reports, enabling HIST-AID to leverage historical data for abnormality detection. HIST-AID uses pre-trained ViT and BERT encoders, time-series transformers, and a multi-modal fusion module (notably ViLT) to fuse past images and reports with current data, capturing temporal dynamics via Rotary Positional Encoding. Across 13 pathologies, the framework achieves substantial improvements in AUROC ($6.56\%$) and AUPRC ($9.51\%$) over current-scan baselines, with gains persisting across gender, age, and race groups; however, very distant histories can hurt performance. The findings highlight the clinical value of incorporating recent historical information (up to $30$ days) while emphasizing careful selection of historical data for robust, equitable automatic diagnosis and decision support.
Abstract
Chest X-ray imaging is a widely accessible and non-invasive diagnostic tool for detecting thoracic abnormalities. While numerous AI models assist radiologists in interpreting these images, most overlook patients' historical data. To bridge this gap, we introduce Temporal MIMIC dataset, which integrates five years of patient history, including radiographic scans and reports from MIMIC-CXR and MIMIC-IV, encompassing 12,221 patients and thirteen pathologies. Building on this, we present HIST-AID, a framework that enhances automatic diagnostic accuracy using historical reports. HIST-AID emulates the radiologist's comprehensive approach, leveraging historical data to improve diagnostic accuracy. Our experiments demonstrate significant improvements, with AUROC increasing by 6.56% and AUPRC by 9.51% compared to models that rely solely on radiographic scans. These gains were consistently observed across diverse demographic groups, including variations in gender, age, and racial categories. We show that while recent data boost performance, older data may reduce accuracy due to changes in patient conditions. Our work paves the potential of incorporating historical data for more reliable automatic diagnosis, providing critical support for clinical decision-making.
