Masked Transformer for Electrocardiogram Classification
Ya Zhou, Xiaolin Diao, Yanni Huo, Yang Liu, Xiaohan Fan, Wei Zhao
TL;DR
This work addresses ECG classification by reframing ECG time series as segment tokens and applying a masked Transformer learned via self-supervised pre-training. By extending the masked autoencoder paradigm from images to ECG data, MTECG employs a lightweight encoder, a 1-layer decoder, learnable positional embeddings, and a fluctuated reconstruction target to learn robust wave-shape features from unlabeled data, then fine-tunes for diagnosis. The authors introduce a large-scale ECG dataset (Fuwai) and demonstrate that pre-training plus careful fine-tuning yields substantial macro F1 gains across Fuwai, PTB-XL, and PCinC datasets, outperforming several state-of-the-art methods. The approach offers a deployment-friendly, Transformer-based solution for ECG tasks and suggests that training strategy and data usage are as crucial as model choice for success in this domain.
Abstract
Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Transformer for ECG classification (MTECG), a simple yet effective method which significantly outperforms recent state-of-the-art algorithms in ECG classification. Our approach adapts the image-based masked autoencoders to self-supervised representation learning from ECG time series. We utilize a lightweight Transformer for the encoder and a 1-layer Transformer for the decoder. The ECG signal is split into a sequence of non-overlapping segments along the time dimension, and learnable positional embeddings are added to preserve the sequential information. We construct the Fuwai dataset comprising 220,251 ECG recordings with a broad range of diagnoses, annotated by medical experts, to explore the potential of Transformer. A strong pre-training and fine-tuning recipe is proposed from the empirical study. The experiments demonstrate that the proposed method increases the macro F1 scores by 3.4%-27.5% on the Fuwai dataset, 9.9%-32.0% on the PTB-XL dataset, and 9.4%-39.1% on a multicenter dataset, compared to the alternative methods. We hope that this study could direct future research on the application of Transformer to more ECG tasks.
