Automated Fusion of Multimodal Electronic Health Records for Better Medical Predictions

Suhan Cui; Jiaqi Wang; Yuan Zhong; Han Liu; Ting Wang; Fenglong Ma

Automated Fusion of Multimodal Electronic Health Records for Better Medical Predictions

Suhan Cui, Jiaqi Wang, Yuan Zhong, Han Liu, Ting Wang, Fenglong Ma

TL;DR

This work tackles the challenge of predicting outcomes from heterogeneous, multimodal EHR data by introducing AutoFM, a neural architecture search framework that automatically designs modality-specific encoders and a multimodal fusion strategy. AutoFM employs a two-stage search space—modality-specific encoding with interaction operations, followed by a fusion DAG with a differentiable feature selector and searchable fusion—augmented by a diversity-promoting penalty and a pruning-based discretization to derive final architectures efficiently. Across four real-world tasks on MIMIC-III data, AutoFM achieves state-of-the-art performance and demonstrates the importance of real-time modalities and diverse fusion strategies, while ablations verify the contribution of each component. The proposed approach enables automatic, robust, and scalable design of predictive models for multimodal EHR data, with potential to improve clinical decision support and patient outcomes.

Abstract

The widespread adoption of Electronic Health Record (EHR) systems in healthcare institutes has generated vast amounts of medical data, offering significant opportunities for improving healthcare services through deep learning techniques. However, the complex and diverse modalities and feature structures in real-world EHR data pose great challenges for deep learning model design. To address the multi-modality challenge in EHR data, current approaches primarily rely on hand-crafted model architectures based on intuition and empirical experiences, leading to sub-optimal model architectures and limited performance. Therefore, to automate the process of model design for mining EHR data, we propose a novel neural architecture search (NAS) framework named AutoFM, which can automatically search for the optimal model architectures for encoding diverse input modalities and fusion strategies. We conduct thorough experiments on real-world multi-modal EHR data and prediction tasks, and the results demonstrate that our framework not only achieves significant performance improvement over existing state-of-the-art methods but also discovers meaningful network architectures effectively.

Automated Fusion of Multimodal Electronic Health Records for Better Medical Predictions

TL;DR

Abstract

Paper Structure (27 sections, 6 equations, 2 figures, 7 tables, 1 algorithm)

This paper contains 27 sections, 6 equations, 2 figures, 7 tables, 1 algorithm.

Introduction
C1 -- Diversifying the Search Space
C2 -- Customizing the Search Optimization
C3 -- Deriving the Optimal Architecture
Our Approach
Related Work
Modeling Multi-modal EHR data
Neural Architecture Search
Methodology
Multimodal EHR Data Embedding
Multi-Modal Search Space Design
Modality Specific Search
Multimodal Fusion Search
Feature Selector
Searchable Fusion
...and 12 more sections

Figures (2)

Figure 1: Overview of the proposed AutoFM.
Figure 2: Searched architecture. The blue arrows represent fixed operations, while the other black arrows are all searched operations. The $\rm interact(\cdot)$ means the interaction operation with the corresponding feature. For the steps nodes $[\mathbf{g}_1, \mathbf{g}_2, \mathbf{g}_3]$, we omit the notations in the figure and fill the node with the selected fusion operations like (sum+att).

Automated Fusion of Multimodal Electronic Health Records for Better Medical Predictions

TL;DR

Abstract

Automated Fusion of Multimodal Electronic Health Records for Better Medical Predictions

Authors

TL;DR

Abstract

Table of Contents

Figures (2)