REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models

Yinghao Zhu; Changyu Ren; Shiyun Xie; Shukai Liu; Hangyuan Ji; Zixiang Wang; Tao Sun; Long He; Zhoujun Li; Xi Zhu; Chengwei Pan

REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models

Yinghao Zhu, Changyu Ren, Shiyun Xie, Shukai Liu, Hangyuan Ji, Zixiang Wang, Tao Sun, Long He, Zhoujun Li, Xi Zhu, Chengwei Pan

TL;DR

REALM tackles the problem of leveraging both unstructured clinical notes and time-series EHR data by embedding them with a GRU and an LLM, respectively, and then augmenting these representations with knowledge retrieved from a professionally labeled KG via a Retrieval-Augmented Generation (RAG) pipeline. By extracting disease entities from both modalities, matching them to PrimeKG with a cosine-similarity threshold, and encoding the retrieved knowledge with an LLM, REALM forms a rich $h_{RAG}$ that complements the original multimodal embeddings in an adaptive fusion network based on self- and cross-attention. The approach achieves state-of-the-art performance on MIMIC-III mortality and 30-day readmission tasks, demonstrates robustness to data sparsity, and includes an analysis of retrieved-entity quality, all while operating offline to support privacy and clinical applicability. This work advances clinical AI by tightly integrating long-context medical knowledge with multimodal EHR data to improve predictive accuracy and interpretability in real-world settings.

Abstract

The integration of multimodal Electronic Health Records (EHR) data has significantly improved clinical predictive capabilities. Leveraging clinical notes and multivariate time-series EHR, existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge, particularly from the knowledge graph (KG). Previous approaches with KG knowledge have primarily focused on structured knowledge extraction, neglecting unstructured data modalities and semantic high dimensional medical knowledge. In response, we propose REALM, a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR representations that address these limitations. Firstly, we apply Large Language Model (LLM) to encode long context clinical notes and GRU model to encode time-series EHR data. Secondly, we prompt LLM to extract task-relevant medical entities and match entities in professionally labeled external knowledge graph (PrimeKG) with corresponding medical knowledge. By matching and aligning with clinical standards, our framework eliminates hallucinations and ensures consistency. Lastly, we propose an adaptive multimodal fusion network to integrate extracted knowledge with multimodal EHR data. Our extensive experiments on MIMIC-III mortality and readmission tasks showcase the superior performance of our REALM framework over baselines, emphasizing the effectiveness of each module. REALM framework contributes to refining the use of multimodal EHR data in healthcare and bridging the gap with nuanced medical context essential for informed clinical predictions.

REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models

TL;DR

that complements the original multimodal embeddings in an adaptive fusion network based on self- and cross-attention. The approach achieves state-of-the-art performance on MIMIC-III mortality and 30-day readmission tasks, demonstrates robustness to data sparsity, and includes an analysis of retrieved-entity quality, all while operating offline to support privacy and clinical applicability. This work advances clinical AI by tightly integrating long-context medical knowledge with multimodal EHR data to improve predictive accuracy and interpretability in real-world settings.

Abstract

Paper Structure (38 sections, 12 equations, 6 figures, 4 tables)

This paper contains 38 sections, 12 equations, 6 figures, 4 tables.

Introduction
Related Work
Multimodal EHR Learning
Incorporating External Knowledge for EHR
Problem Formulation
Methodology
Overview
Multimodal EHR Embedding Extraction
RAG-Driven Enhancement Pipeline
Extract Entities from Multimodal EHR Data
RAG module for time series.
RAG module for clinical text records.
Match extracted entities with external KG
Encode KG Knowledge
Multimodal Fusion Network
...and 23 more sections

Figures (6)

Figure 1: Overall architecture of our proposed REALM framework.
Figure 2: RAG pipeline for time series EHR modality.
Figure 3: RAG pipeline for clinical notes modality.
Figure 4: Fusion module. It combines multimodal embeddings with attention mechanism into a fused representation.
Figure 5: AUPRC Performance across 4 Sparsity Levels on MIMIC-III mortality outcome task. REALM exhibits better performance on multiple missing rate levels than recent SOTA baselines.
...and 1 more figures

REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models

TL;DR

Abstract

REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)