MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models
Thao Minh Nguyen Phan, Cong-Tinh Dao, Chenwei Wu, Jian-Zhe Wang, Shun Liu, Jun-En Ding, David Restrepo, Feng Liu, Fang-Ming Hung, Wen-Chih Peng
TL;DR
MEDFuse tackles the challenge of fusing multimodal EHR data by employing modality-specific embeddings from fine-tuned clinical-note LLMs and a Masked Lab-Test Modeling module for labs, followed by a disentangled transformer optimized with mutual information to separate modality-specific and shared information. This architecture enables a rich joint representation for multi-label disease prediction and demonstrates strong, generalizable performance across MIMIC-III and FEMH, including notable gains over strong baselines. The work contributes a concrete method for integrating heterogeneous EHR signals with explicit preservation of modality-specific insights, offering potential improvements in clinical decision support and paving the way for more nuanced multimodal EHR analyses. The findings suggest that combining top-down textual understanding with structured numerical reasoning, underpinned by MI-informed fusion, can yield robust diagnostic predictions in real-world healthcare settings.
Abstract
Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they either focus on a single modality or overlook the inter-modality interactions/redundancy. In this work, we propose MEDFuse, a Multimodal EHR Data Fusion framework that incorporates masked lab-test modeling and large language models (LLMs) to effectively integrate structured and unstructured medical data. MEDFuse leverages multimodal embeddings extracted from two sources: LLMs fine-tuned on free clinical text and masked tabular transformers trained on structured lab test results. We design a disentangled transformer module, optimized by a mutual information loss to 1) decouple modality-specific and modality-shared information and 2) extract useful joint representation from the noise and redundancy present in clinical notes. Through comprehensive validation on the public MIMIC-III dataset and the in-house FEMH dataset, MEDFuse demonstrates great potential in advancing clinical predictions, achieving over 90% F1 score in the 10-disease multi-label classification task.
