Table of Contents
Fetching ...

RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records

Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen Jin, May D. Wang, Joyce C. Ho, Carl Yang

TL;DR

RAM-EHR introduces a retrieval-augmented framework for EHR-based clinical predictions by aggregating multi-source external knowledge into a textual corpus, retrieving passages linked to medical codes with dense representations, and summarizing them via an LLM. The summarized knowledge is then fused with visit-level information through a co-training scheme between an augmented predictor and a local EHR model, guided by consistency regularization. Empirical results on MIMIC-III and Cradle show consistent gains over knowledge-enhanced baselines, with improvements of 3.4 percentage points in AUROC and 7.2 percentage points in AUPR on average, and ablations confirm the value of retrieval, summarization, and co-training. The approach is modular and can serve as a flexible plugin for diverse EHR backbones, offering a scalable path to incorporate broad external knowledge into clinical decision support.

Abstract

We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks. The code will be published at \url{https://github.com/ritaranx/RAM-EHR}.

RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records

TL;DR

RAM-EHR introduces a retrieval-augmented framework for EHR-based clinical predictions by aggregating multi-source external knowledge into a textual corpus, retrieving passages linked to medical codes with dense representations, and summarizing them via an LLM. The summarized knowledge is then fused with visit-level information through a co-training scheme between an augmented predictor and a local EHR model, guided by consistency regularization. Empirical results on MIMIC-III and Cradle show consistent gains over knowledge-enhanced baselines, with improvements of 3.4 percentage points in AUROC and 7.2 percentage points in AUPR on average, and ablations confirm the value of retrieval, summarization, and co-training. The approach is modular and can serve as a flexible plugin for diverse EHR backbones, offering a scalable path to incorporate broad external knowledge into clinical decision support.

Abstract

We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks. The code will be published at \url{https://github.com/ritaranx/RAM-EHR}.
Paper Structure (29 sections, 9 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 9 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: An overview of retrieval augmentation framework (left) and a detailed workflow of Ram-EHR (right). Ram-EHR initially gathers multiple knowledge sources and converts them into textual format. We then use dense retrieval to obtain information related to medical concepts. Next, we design an additional module to augment the local EHR predictive model co-trained with consistency regularization, capturing complementary information from both patient visits and summarized knowledge.
  • Figure 2: Effect of $g_\phi$ and $f_\theta$ on both datasets.
  • Figure 3: Studies on Information Source $\cM$.
  • Figure 4: Parameter studies of $\beta$ and $\lambda$ on both datasets.
  • Figure 5: Case study and human study. The case study compares knowledge summarized by Ram-EHR and directly generated by LLM prompting. Bold denotes disease, medication and procedure concepts. Blue and Red indicate useful and irrelevant knowledge.