RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records
Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen Jin, May D. Wang, Joyce C. Ho, Carl Yang
TL;DR
RAM-EHR introduces a retrieval-augmented framework for EHR-based clinical predictions by aggregating multi-source external knowledge into a textual corpus, retrieving passages linked to medical codes with dense representations, and summarizing them via an LLM. The summarized knowledge is then fused with visit-level information through a co-training scheme between an augmented predictor and a local EHR model, guided by consistency regularization. Empirical results on MIMIC-III and Cradle show consistent gains over knowledge-enhanced baselines, with improvements of 3.4 percentage points in AUROC and 7.2 percentage points in AUPR on average, and ablations confirm the value of retrieval, summarization, and co-training. The approach is modular and can serve as a flexible plugin for diverse EHR backbones, offering a scalable path to incorporate broad external knowledge into clinical decision support.
Abstract
We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks. The code will be published at \url{https://github.com/ritaranx/RAM-EHR}.
