Table of Contents
Fetching ...

RawECGNet: Deep Learning Generalization for Atrial Fibrillation Detection from the Raw ECG

Noam Ben-Moshe, Kenta Tsutsui, Shany Biton, Leif Sörnmo, Joachim A. Behar

TL;DR

RawECGNet introduces a two-stage, morphology-aware deep learning architecture to detect AF and AFl from raw single-lead ECG, achieving superior generalization across leads and external cohorts compared with a rhythm-based RR-interval model. By integrating a ResNet-based encoder with domain-shift uncertainty and a BiGRU for temporal context, trained on multiple leads, the model robustly handles distribution shifts due to geography, lead position, and demographics. Extensive evaluation on UVAF, RBDB, and SHDB demonstrates improved per-window F1 scores and substantially lower AF burden estimation error, with detailed ablation and error analyses identifying key drivers of performance and residual challenges. The work underscores the value of morphology information and cross-lead training for robust AF/AFl detection, and points to future directions including 12-lead data and ECG foundation models to further enhance generalization in diverse real-world settings.

Abstract

Introduction: Deep learning models for detecting episodes of atrial fibrillation (AF) using rhythm information in long-term, ambulatory ECG recordings have shown high performance. However, the rhythm-based approach does not take advantage of the morphological information conveyed by the different ECG waveforms, particularly the f-waves. As a result, the performance of such models may be inherently limited. Methods: To address this limitation, we have developed a deep learning model, named RawECGNet, to detect episodes of AF and atrial flutter (AFl) using the raw, single-lead ECG. We compare the generalization performance of RawECGNet on two external data sets that account for distribution shifts in geography, ethnicity, and lead position. RawECGNet is further benchmarked against a state-of-the-art deep learning model, named ArNet2, which utilizes rhythm information as input. Results: Using RawECGNet, the results for the different leads in the external test sets in terms of the F1 score were 0.91--0.94 in RBDB and 0.93 in SHDB, compared to 0.89--0.91 in RBDB and 0.91 in SHDB for ArNet2. The results highlight RawECGNet as a high-performance, generalizable algorithm for detection of AF and AFl episodes, exploiting information on both rhythm and morphology.

RawECGNet: Deep Learning Generalization for Atrial Fibrillation Detection from the Raw ECG

TL;DR

RawECGNet introduces a two-stage, morphology-aware deep learning architecture to detect AF and AFl from raw single-lead ECG, achieving superior generalization across leads and external cohorts compared with a rhythm-based RR-interval model. By integrating a ResNet-based encoder with domain-shift uncertainty and a BiGRU for temporal context, trained on multiple leads, the model robustly handles distribution shifts due to geography, lead position, and demographics. Extensive evaluation on UVAF, RBDB, and SHDB demonstrates improved per-window F1 scores and substantially lower AF burden estimation error, with detailed ablation and error analyses identifying key drivers of performance and residual challenges. The work underscores the value of morphology information and cross-lead training for robust AF/AFl detection, and points to future directions including 12-lead data and ECG foundation models to further enhance generalization in diverse real-world settings.

Abstract

Introduction: Deep learning models for detecting episodes of atrial fibrillation (AF) using rhythm information in long-term, ambulatory ECG recordings have shown high performance. However, the rhythm-based approach does not take advantage of the morphological information conveyed by the different ECG waveforms, particularly the f-waves. As a result, the performance of such models may be inherently limited. Methods: To address this limitation, we have developed a deep learning model, named RawECGNet, to detect episodes of AF and atrial flutter (AFl) using the raw, single-lead ECG. We compare the generalization performance of RawECGNet on two external data sets that account for distribution shifts in geography, ethnicity, and lead position. RawECGNet is further benchmarked against a state-of-the-art deep learning model, named ArNet2, which utilizes rhythm information as input. Results: Using RawECGNet, the results for the different leads in the external test sets in terms of the F1 score were 0.91--0.94 in RBDB and 0.93 in SHDB, compared to 0.89--0.91 in RBDB and 0.91 in SHDB for ArNet2. The results highlight RawECGNet as a high-performance, generalizable algorithm for detection of AF and AFl episodes, exploiting information on both rhythm and morphology.
Paper Structure (16 sections, 2 equations, 6 figures, 2 tables)

This paper contains 16 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: RawECGNet architecture. (a) Network for training (first step), consisting of residual network blocks (ResBlocks), shrink blocks, batch normalization (BN), a domain-shifts-with-uncertainty (DSU) layer, and dense blocks. (b) The ResBlock architecture. (c) The second step consists of a bidirectional GRU (BiGRU) unit and a dense block. The first step of the network is trained for binary classification of 30-s windows, and the extracted features of each window are used as input to the second step. Each window is concatenated with $p$ preceding and $s$ succeeding windows. A dense layer is used and outputs for each window a probability of AFl.
  • Figure 2: Performance of AFl window classification for ArNet2 and RawECGNet, presented for each lead and for each data set. The median F1 bootstrap results are presented with error bars representing Q1 and Q3 of the results. The following windows are analyzed: (a) All windows, (b) AF and non-AFl windows only, and (c) AFl and non-AFl windows only.
  • Figure 3: Performance of AFl window classification across (a) age and (b) sex. The median F1 bootstrap results are presented with error bars representing Q1 and Q3 of the results. The results are presented for the combined test set. The number of patients $(p)$ is displayed for each group.
  • Figure 4: Histograms of the AFB error $E\textsubscript{AF}$ (%), computed for the combined test sets, including all leads, and grouped according to the following labels: (a) Non-AF, (b) mild AF, (c) moderate AF, and (d) severe AF.
  • Figure 5: Ablation study of the different components of RawECGNet.
  • ...and 1 more figures