Table of Contents
Fetching ...

MUSE-Net: Missingness-aware mUlti-branching Self-attention Encoder for Irregular Longitudinal Electronic Health Records

Zekai Wang, Tieming Liu, Bing Yao

TL;DR

This paper proposes a novel Missingness-aware mUlti-branching Self-Attention Encoder (MUSE-Net) to cope with the challenges in modeling longitudinal EHRs for data-driven disease prediction and evaluates the proposed MUSE-Net using both synthetic and real-world datasets.

Abstract

The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for developing data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues of incompleteness, and data imbalance. Realizing the full data potential of EHRs hinges on the development of advanced analytical models. In this paper, we propose a novel Missingness-aware mUlti-branching Self-Attention Encoder (MUSE-Net) to cope with the challenges in modeling longitudinal EHRs for data-driven disease prediction. The proposed MUSE-Net is composed by four novel modules including: (1) a multi-task Gaussian process (MGP) with missing value masks for data imputation; (2) a multi-branching architecture to address the data imbalance problem; (3) a time-aware self-attention encoder to account for the irregularly spaced time interval in longitudinal EHRs; (4) interpretable multi-head attention mechanism that provides insights into the importance of different time points in disease prediction, allowing clinicians to trace model decisions. We evaluate the proposed MUSE-Net using both synthetic and real-world datasets. Experimental results show that our MUSE-Net outperforms existing methods that are widely used to investigate longitudinal signals.

MUSE-Net: Missingness-aware mUlti-branching Self-attention Encoder for Irregular Longitudinal Electronic Health Records

TL;DR

This paper proposes a novel Missingness-aware mUlti-branching Self-Attention Encoder (MUSE-Net) to cope with the challenges in modeling longitudinal EHRs for data-driven disease prediction and evaluates the proposed MUSE-Net using both synthetic and real-world datasets.

Abstract

The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for developing data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues of incompleteness, and data imbalance. Realizing the full data potential of EHRs hinges on the development of advanced analytical models. In this paper, we propose a novel Missingness-aware mUlti-branching Self-Attention Encoder (MUSE-Net) to cope with the challenges in modeling longitudinal EHRs for data-driven disease prediction. The proposed MUSE-Net is composed by four novel modules including: (1) a multi-task Gaussian process (MGP) with missing value masks for data imputation; (2) a multi-branching architecture to address the data imbalance problem; (3) a time-aware self-attention encoder to account for the irregularly spaced time interval in longitudinal EHRs; (4) interpretable multi-head attention mechanism that provides insights into the importance of different time points in disease prediction, allowing clinicians to trace model decisions. We evaluate the proposed MUSE-Net using both synthetic and real-world datasets. Experimental results show that our MUSE-Net outperforms existing methods that are widely used to investigate longitudinal signals.
Paper Structure (19 sections, 27 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 19 sections, 27 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: (a) MGP for data imputation and missing value masks generation; The imputed data and missing value masks are processed by MUSE-Net, which consist of (b) Missingness-aware Self-attention Encoder with Interpretable Multi-head Attention and (c) Multi-branching output layer.
  • Figure 2: Evaluation scores across epochs for MUSE-Net-9 with different imputation methods on simulated validation data.
  • Figure 3: Evaluation metrics scores across epochs for MUSE-Net with varying MB outputs on MGP-imputed validation data.
  • Figure 4: AUROC and AUPRC across 15 epochs for our MUSE-Net and other benchmarks on the DR validation set.
  • Figure 5: The averaged attention maps over all test samples for the first and second layers of the MUSE-Net: (a) the first layer; (b) the second layer
  • ...and 1 more figures

Theorems & Definitions (3)

  • proof
  • proof
  • proof