Table of Contents
Fetching ...

Masked Transformer for Electrocardiogram Classification

Ya Zhou, Xiaolin Diao, Yanni Huo, Yang Liu, Xiaohan Fan, Wei Zhao

TL;DR

This work addresses ECG classification by reframing ECG time series as segment tokens and applying a masked Transformer learned via self-supervised pre-training. By extending the masked autoencoder paradigm from images to ECG data, MTECG employs a lightweight encoder, a 1-layer decoder, learnable positional embeddings, and a fluctuated reconstruction target to learn robust wave-shape features from unlabeled data, then fine-tunes for diagnosis. The authors introduce a large-scale ECG dataset (Fuwai) and demonstrate that pre-training plus careful fine-tuning yields substantial macro F1 gains across Fuwai, PTB-XL, and PCinC datasets, outperforming several state-of-the-art methods. The approach offers a deployment-friendly, Transformer-based solution for ECG tasks and suggests that training strategy and data usage are as crucial as model choice for success in this domain.

Abstract

Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Transformer for ECG classification (MTECG), a simple yet effective method which significantly outperforms recent state-of-the-art algorithms in ECG classification. Our approach adapts the image-based masked autoencoders to self-supervised representation learning from ECG time series. We utilize a lightweight Transformer for the encoder and a 1-layer Transformer for the decoder. The ECG signal is split into a sequence of non-overlapping segments along the time dimension, and learnable positional embeddings are added to preserve the sequential information. We construct the Fuwai dataset comprising 220,251 ECG recordings with a broad range of diagnoses, annotated by medical experts, to explore the potential of Transformer. A strong pre-training and fine-tuning recipe is proposed from the empirical study. The experiments demonstrate that the proposed method increases the macro F1 scores by 3.4%-27.5% on the Fuwai dataset, 9.9%-32.0% on the PTB-XL dataset, and 9.4%-39.1% on a multicenter dataset, compared to the alternative methods. We hope that this study could direct future research on the application of Transformer to more ECG tasks.

Masked Transformer for Electrocardiogram Classification

TL;DR

This work addresses ECG classification by reframing ECG time series as segment tokens and applying a masked Transformer learned via self-supervised pre-training. By extending the masked autoencoder paradigm from images to ECG data, MTECG employs a lightweight encoder, a 1-layer decoder, learnable positional embeddings, and a fluctuated reconstruction target to learn robust wave-shape features from unlabeled data, then fine-tunes for diagnosis. The authors introduce a large-scale ECG dataset (Fuwai) and demonstrate that pre-training plus careful fine-tuning yields substantial macro F1 gains across Fuwai, PTB-XL, and PCinC datasets, outperforming several state-of-the-art methods. The approach offers a deployment-friendly, Transformer-based solution for ECG tasks and suggests that training strategy and data usage are as crucial as model choice for success in this domain.

Abstract

Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Transformer for ECG classification (MTECG), a simple yet effective method which significantly outperforms recent state-of-the-art algorithms in ECG classification. Our approach adapts the image-based masked autoencoders to self-supervised representation learning from ECG time series. We utilize a lightweight Transformer for the encoder and a 1-layer Transformer for the decoder. The ECG signal is split into a sequence of non-overlapping segments along the time dimension, and learnable positional embeddings are added to preserve the sequential information. We construct the Fuwai dataset comprising 220,251 ECG recordings with a broad range of diagnoses, annotated by medical experts, to explore the potential of Transformer. A strong pre-training and fine-tuning recipe is proposed from the empirical study. The experiments demonstrate that the proposed method increases the macro F1 scores by 3.4%-27.5% on the Fuwai dataset, 9.9%-32.0% on the PTB-XL dataset, and 9.4%-39.1% on a multicenter dataset, compared to the alternative methods. We hope that this study could direct future research on the application of Transformer to more ECG tasks.
Paper Structure (25 sections, 10 equations, 4 figures, 2 tables)

This paper contains 25 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: An example of the masked pre-training method. The original ECG signals are split to non-overlapping segments and a subset of these segments is masked out. The unmasked segments were used to reconstruct the fluctuated transformation of the masked segments through Transformer blocks. The lead and sequence information of the ECG signals are preserved by learnable positional embeddings. The masked segments are represented by learnable embeddings in the reconstruction task.
  • Figure 2: Performance comparison between masked pre-training and training from scratch on the Fuwai dataset. The optimal epoch for masked pre-training is found to be 48, whereas for training from scratch, it is 102.
  • Figure 3: Classification performance in the ablation study. The first column, from top to bottom, corresponds to masking ratio and pre-training schedule lengths, respectively. Similarly, the second column, from top to bottom, corresponds to layer-wise LR decay and DropPath rate, respectively.
  • Figure 4: Reconstruction examples from one patient in the validation set under different reconstruction targets. The first row corresponds to the whole ECG signal, while the second row corresponds to a sequence of signal segments. The columns represent the original, per-segment normalization \ref{['eqn:per-segment']}, and squaring operation \ref{['eqn:squaring']} targets, respectively. The gray regions indicate the masked segments. The black point represents the original signal, while the blue point within the masked region represents the reconstructed signal.