Automated Fusion of Multimodal Electronic Health Records for Better Medical Predictions
Suhan Cui, Jiaqi Wang, Yuan Zhong, Han Liu, Ting Wang, Fenglong Ma
TL;DR
This work tackles the challenge of predicting outcomes from heterogeneous, multimodal EHR data by introducing AutoFM, a neural architecture search framework that automatically designs modality-specific encoders and a multimodal fusion strategy. AutoFM employs a two-stage search space—modality-specific encoding with interaction operations, followed by a fusion DAG with a differentiable feature selector and searchable fusion—augmented by a diversity-promoting penalty and a pruning-based discretization to derive final architectures efficiently. Across four real-world tasks on MIMIC-III data, AutoFM achieves state-of-the-art performance and demonstrates the importance of real-time modalities and diverse fusion strategies, while ablations verify the contribution of each component. The proposed approach enables automatic, robust, and scalable design of predictive models for multimodal EHR data, with potential to improve clinical decision support and patient outcomes.
Abstract
The widespread adoption of Electronic Health Record (EHR) systems in healthcare institutes has generated vast amounts of medical data, offering significant opportunities for improving healthcare services through deep learning techniques. However, the complex and diverse modalities and feature structures in real-world EHR data pose great challenges for deep learning model design. To address the multi-modality challenge in EHR data, current approaches primarily rely on hand-crafted model architectures based on intuition and empirical experiences, leading to sub-optimal model architectures and limited performance. Therefore, to automate the process of model design for mining EHR data, we propose a novel neural architecture search (NAS) framework named AutoFM, which can automatically search for the optimal model architectures for encoding diverse input modalities and fusion strategies. We conduct thorough experiments on real-world multi-modal EHR data and prediction tasks, and the results demonstrate that our framework not only achieves significant performance improvement over existing state-of-the-art methods but also discovers meaningful network architectures effectively.
