REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models
Yinghao Zhu, Changyu Ren, Shiyun Xie, Shukai Liu, Hangyuan Ji, Zixiang Wang, Tao Sun, Long He, Zhoujun Li, Xi Zhu, Chengwei Pan
TL;DR
REALM tackles the problem of leveraging both unstructured clinical notes and time-series EHR data by embedding them with a GRU and an LLM, respectively, and then augmenting these representations with knowledge retrieved from a professionally labeled KG via a Retrieval-Augmented Generation (RAG) pipeline. By extracting disease entities from both modalities, matching them to PrimeKG with a cosine-similarity threshold, and encoding the retrieved knowledge with an LLM, REALM forms a rich $h_{RAG}$ that complements the original multimodal embeddings in an adaptive fusion network based on self- and cross-attention. The approach achieves state-of-the-art performance on MIMIC-III mortality and 30-day readmission tasks, demonstrates robustness to data sparsity, and includes an analysis of retrieved-entity quality, all while operating offline to support privacy and clinical applicability. This work advances clinical AI by tightly integrating long-context medical knowledge with multimodal EHR data to improve predictive accuracy and interpretability in real-world settings.
Abstract
The integration of multimodal Electronic Health Records (EHR) data has significantly improved clinical predictive capabilities. Leveraging clinical notes and multivariate time-series EHR, existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge, particularly from the knowledge graph (KG). Previous approaches with KG knowledge have primarily focused on structured knowledge extraction, neglecting unstructured data modalities and semantic high dimensional medical knowledge. In response, we propose REALM, a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR representations that address these limitations. Firstly, we apply Large Language Model (LLM) to encode long context clinical notes and GRU model to encode time-series EHR data. Secondly, we prompt LLM to extract task-relevant medical entities and match entities in professionally labeled external knowledge graph (PrimeKG) with corresponding medical knowledge. By matching and aligning with clinical standards, our framework eliminates hallucinations and ensures consistency. Lastly, we propose an adaptive multimodal fusion network to integrate extracted knowledge with multimodal EHR data. Our extensive experiments on MIMIC-III mortality and readmission tasks showcase the superior performance of our REALM framework over baselines, emphasizing the effectiveness of each module. REALM framework contributes to refining the use of multimodal EHR data in healthcare and bridging the gap with nuanced medical context essential for informed clinical predictions.
