Table of Contents
Fetching ...

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Jiarui Jin, Haoyu Wang, Hongyan Li, Jun Li, Jiahui Pan, Shenda Hong

TL;DR

This work addresses the annotation bottleneck in ECG analysis by reframing ECG signals as a language: heartbeats are words and rhythms are sentences. It introduces HeartLang, a self-supervised framework built around a QRS-Tokenizer, a ST-ECGFormer backbone, vector-quantized heartbeat reconstruction (VQ-HBR) to build a large ECG vocabulary, and masked ECG sentence pre-training to learn rhythm-level representations. The authors construct the largest heartbeat-based vocabulary (5,394 words) and demonstrate competitive, often superior, performance across six datasets with linear probing and downstream tasks, highlighting improved generalization and semantic richness over fixed-window eSSL methods. This approach advances ECG language processing by enabling form- and rhythm-level learning without labels, with practical implications for robust, annotation-efficient cardiac diagnostics.

Abstract

Electrocardiogram (ECG) is essential for the clinical diagnosis of arrhythmias and other heart diseases, but deep learning methods based on ECG often face limitations due to the need for high-quality annotations. Although previous ECG self-supervised learning (eSSL) methods have made significant progress in representation learning from unannotated ECG data, they typically treat ECG signals as ordinary time-series data, segmenting the signals using fixed-size and fixed-step time windows, which often ignore the form and rhythm characteristics and latent semantic relationships in ECG signals. In this work, we introduce a novel perspective on ECG signals, treating heartbeats as words and rhythms as sentences. Based on this perspective, we first designed the QRS-Tokenizer, which generates semantically meaningful ECG sentences from the raw ECG signals. Building on these, we then propose HeartLang, a novel self-supervised learning framework for ECG language processing, learning general representations at form and rhythm levels. Additionally, we construct the largest heartbeat-based ECG vocabulary to date, which will further advance the development of ECG language processing. We evaluated HeartLang across six public ECG datasets, where it demonstrated robust competitiveness against other eSSL methods. Our data and code are publicly available at https://github.com/PKUDigitalHealth/HeartLang.

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

TL;DR

This work addresses the annotation bottleneck in ECG analysis by reframing ECG signals as a language: heartbeats are words and rhythms are sentences. It introduces HeartLang, a self-supervised framework built around a QRS-Tokenizer, a ST-ECGFormer backbone, vector-quantized heartbeat reconstruction (VQ-HBR) to build a large ECG vocabulary, and masked ECG sentence pre-training to learn rhythm-level representations. The authors construct the largest heartbeat-based vocabulary (5,394 words) and demonstrate competitive, often superior, performance across six datasets with linear probing and downstream tasks, highlighting improved generalization and semantic richness over fixed-window eSSL methods. This approach advances ECG language processing by enabling form- and rhythm-level learning without labels, with practical implications for robust, annotation-efficient cardiac diagnostics.

Abstract

Electrocardiogram (ECG) is essential for the clinical diagnosis of arrhythmias and other heart diseases, but deep learning methods based on ECG often face limitations due to the need for high-quality annotations. Although previous ECG self-supervised learning (eSSL) methods have made significant progress in representation learning from unannotated ECG data, they typically treat ECG signals as ordinary time-series data, segmenting the signals using fixed-size and fixed-step time windows, which often ignore the form and rhythm characteristics and latent semantic relationships in ECG signals. In this work, we introduce a novel perspective on ECG signals, treating heartbeats as words and rhythms as sentences. Based on this perspective, we first designed the QRS-Tokenizer, which generates semantically meaningful ECG sentences from the raw ECG signals. Building on these, we then propose HeartLang, a novel self-supervised learning framework for ECG language processing, learning general representations at form and rhythm levels. Additionally, we construct the largest heartbeat-based ECG vocabulary to date, which will further advance the development of ECG language processing. We evaluated HeartLang across six public ECG datasets, where it demonstrated robust competitiveness against other eSSL methods. Our data and code are publicly available at https://github.com/PKUDigitalHealth/HeartLang.

Paper Structure

This paper contains 28 sections, 8 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Two perspectives on ECG signals.
  • Figure 2: Framework of HeartLang.
  • Figure 3: The validation loss curve during VQ-HBR training (left) and the prediction accuracy curve during masked ECG sentence pre-training (right), shown from two perspectives.
  • Figure 4: ECG vocabulary visualization.
  • Figure 5: More ECG Words Visualization.
  • ...and 3 more figures