Table of Contents
Fetching ...

Guiding Masked Representation Learning to Capture Spatio-Temporal Relationship of Electrocardiogram

Yeongyeon Na, Minje Park, Yunwon Tae, Sunghoon Joo

TL;DR

ST-MEM (Spatio-Temporal Masked Electrocardiogram Modeling), designed to learn spatio-temporal features by reconstructing masked 12-lead ECG data, outperforms other SSL baseline methods in various experimental settings for arrhythmia classification tasks and is adaptable to various lead combinations.

Abstract

Electrocardiograms (ECG) are widely employed as a diagnostic tool for monitoring electrical signals originating from a heart. Recent machine learning research efforts have focused on the application of screening various diseases using ECG signals. However, adapting to the application of screening disease is challenging in that labeled ECG data are limited. Achieving general representation through self-supervised learning (SSL) is a well-known approach to overcome the scarcity of labeled data; however, a naive application of SSL to ECG data, without considering the spatial-temporal relationships inherent in ECG signals, may yield suboptimal results. In this paper, we introduce ST-MEM (Spatio-Temporal Masked Electrocardiogram Modeling), designed to learn spatio-temporal features by reconstructing masked 12-lead ECG data. ST-MEM outperforms other SSL baseline methods in various experimental settings for arrhythmia classification tasks. Moreover, we demonstrate that ST-MEM is adaptable to various lead combinations. Through quantitative and qualitative analysis, we show a spatio-temporal relationship within ECG data. Our code is available at https://github.com/bakqui/ST-MEM.

Guiding Masked Representation Learning to Capture Spatio-Temporal Relationship of Electrocardiogram

TL;DR

ST-MEM (Spatio-Temporal Masked Electrocardiogram Modeling), designed to learn spatio-temporal features by reconstructing masked 12-lead ECG data, outperforms other SSL baseline methods in various experimental settings for arrhythmia classification tasks and is adaptable to various lead combinations.

Abstract

Electrocardiograms (ECG) are widely employed as a diagnostic tool for monitoring electrical signals originating from a heart. Recent machine learning research efforts have focused on the application of screening various diseases using ECG signals. However, adapting to the application of screening disease is challenging in that labeled ECG data are limited. Achieving general representation through self-supervised learning (SSL) is a well-known approach to overcome the scarcity of labeled data; however, a naive application of SSL to ECG data, without considering the spatial-temporal relationships inherent in ECG signals, may yield suboptimal results. In this paper, we introduce ST-MEM (Spatio-Temporal Masked Electrocardiogram Modeling), designed to learn spatio-temporal features by reconstructing masked 12-lead ECG data. ST-MEM outperforms other SSL baseline methods in various experimental settings for arrhythmia classification tasks. Moreover, we demonstrate that ST-MEM is adaptable to various lead combinations. Through quantitative and qualitative analysis, we show a spatio-temporal relationship within ECG data. Our code is available at https://github.com/bakqui/ST-MEM.
Paper Structure (35 sections, 8 figures, 22 tables)

This paper contains 35 sections, 8 figures, 22 tables.

Figures (8)

  • Figure 1: An illustration of 12-lead electrocardiogram (ECG). ECG signals consist of 12 leads. Each lead is measured from different spatial locations. Limb leads (i.e., I, II, III, aVR, aVL, and aVF) are generated from a frontal plane, while precordial leads (i.e., V1, V2, V3, V4, V5, and V6) are obtained from a horizontal plane.
  • Figure 2: An illustration of spatio-temporal patchifying. The black dashed box indicates the query patch; each arrow represents the self-attention arrow; each color represents a patch, a single input sample for the model. Temporal patchifying from (a) provides three different patches (i.e., three different inputs). Spatial patchifying from (b) yields 12 patches for every 12 leads. Spatio-temporal patchifying from (c) can provide fine-grained input signals for the model, which allows for capturing spatial and temporal relationships.
  • Figure 3: An overview of our proposed method. ST-MEM consists of an encoder and decoder for reconstructing the masked ECG signals. The encoder takes patchfied ECG signals with lead and position embedding. The shared decoder reconstructs the masked ECG signals for each lead by utilizing the encoded representations.
  • Figure 4: A t-SNE plot of ECG signal representation learned from ST-MEM. Each circle represents the single ECG signal representation with different leads. The ellipse with blue and orange indicates the Gaussian (i.e., a cluster) obtained from the Gaussian mixture model (GMM).
  • Figure 5: An illustration of an attention map. Attention scores from each encoder layer and head are averaged.
  • ...and 3 more figures