Table of Contents
Fetching ...

A Benchmark Study of Deep Learning Methods for Multi-Label Pediatric Electrocardiogram-Based Cardiovascular Disease Classification

Yiqiao Chen

TL;DR

The study addresses the lack of benchmarks for multi-label pediatric ECG-based CVD classification by evaluating four deep learning paradigms (ResNet-1D, BiLSTM, Transformer, Mamba 2) on the ZZU-pECG dataset, across 9-lead and 12-lead configurations for 19 pediatric CVD labels. It establishes a standardized processing and evaluation framework, including 3-second slices, down-sampling, stratified splits, and a weighted binary cross-entropy loss to handle class imbalance. The results show high performance across models (e.g., macro-F1 above 85% and Hamming loss near zero), with ResNet-1D achieving a macro-F1 of 94.67% on 12-lead, and Transformer showing gains in larger, more information-rich settings; rare diseases remain challenging, especially in the 9-lead subset. The work highlights the need for larger, multi-center validation, age-stratified analyses, broader disease coverage, and cross-subject generalization strategies to enable real-world pediatric ECG deployment.

Abstract

Cardiovascular disease (CVD) is a major pediatric health burden, and early screening is of critical importance. Electrocardiography (ECG), as a noninvasive and accessible tool, is well suited for this purpose. This paper presents the first benchmark study of deep learning for multi-label pediatric CVD classification on the recently released ZZU-pECG dataset, comprising 3716 recordings with 19 CVD categories. We systematically evaluate four representative paradigms--ResNet-1D, BiLSTM, Transformer, and Mamba 2--under both 9-lead and 12-lead configurations. All models achieved strong results, with Hamming Loss as low as 0.0069 and F1-scores above 85% in most settings. ResNet-1D reached a macro-F1 of 94.67% on the 12-lead subset, while BiLSTM and Transformer also showed competitive performance. Per-class analysis indicated challenges for rare conditions such as hypertrophic cardiomyopathy in the 9-lead subset, reflecting the effect of limited positive samples. This benchmark establishes reusable baselines and highlights complementary strengths across paradigms. It further points to the need for larger-scale, multi-center validation, age-stratified analysis, and broader disease coverage to support real-world pediatric ECG applications.

A Benchmark Study of Deep Learning Methods for Multi-Label Pediatric Electrocardiogram-Based Cardiovascular Disease Classification

TL;DR

The study addresses the lack of benchmarks for multi-label pediatric ECG-based CVD classification by evaluating four deep learning paradigms (ResNet-1D, BiLSTM, Transformer, Mamba 2) on the ZZU-pECG dataset, across 9-lead and 12-lead configurations for 19 pediatric CVD labels. It establishes a standardized processing and evaluation framework, including 3-second slices, down-sampling, stratified splits, and a weighted binary cross-entropy loss to handle class imbalance. The results show high performance across models (e.g., macro-F1 above 85% and Hamming loss near zero), with ResNet-1D achieving a macro-F1 of 94.67% on 12-lead, and Transformer showing gains in larger, more information-rich settings; rare diseases remain challenging, especially in the 9-lead subset. The work highlights the need for larger, multi-center validation, age-stratified analyses, broader disease coverage, and cross-subject generalization strategies to enable real-world pediatric ECG deployment.

Abstract

Cardiovascular disease (CVD) is a major pediatric health burden, and early screening is of critical importance. Electrocardiography (ECG), as a noninvasive and accessible tool, is well suited for this purpose. This paper presents the first benchmark study of deep learning for multi-label pediatric CVD classification on the recently released ZZU-pECG dataset, comprising 3716 recordings with 19 CVD categories. We systematically evaluate four representative paradigms--ResNet-1D, BiLSTM, Transformer, and Mamba 2--under both 9-lead and 12-lead configurations. All models achieved strong results, with Hamming Loss as low as 0.0069 and F1-scores above 85% in most settings. ResNet-1D reached a macro-F1 of 94.67% on the 12-lead subset, while BiLSTM and Transformer also showed competitive performance. Per-class analysis indicated challenges for rare conditions such as hypertrophic cardiomyopathy in the 9-lead subset, reflecting the effect of limited positive samples. This benchmark establishes reusable baselines and highlights complementary strengths across paradigms. It further points to the need for larger-scale, multi-center validation, age-stratified analysis, and broader disease coverage to support real-world pediatric ECG applications.

Paper Structure

This paper contains 8 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Schematic comparison of the four benchmark architectures evaluated in this study. From left to right: (a) a residual convolutional block (ResNet-1D), which captures local temporal patterns via stacked 1D convolutions and shortcut connections; (b) a bidirectional long short-term memory (BiLSTM) network, which models sequential dependencies in both forward and backward directions; (c) a Mamba 2 block, a recent state-space model (SSM) that combines input projection, depthwise convolution, and structured state-space dynamics for efficient long-range dependency modeling; (d) a Transformer encoder block, which leverages multi-head self-attention and feed-forward layers for global context modeling.
  • Figure 2: Example of a single-lead ECG slice.
  • Figure 3: Per-class performance comparison of four baseline models (ResNet-1D, BiLSTM, Mamba 2, and Transformer) on the 9-lead ECG subset. Panels (a), (b), and (c) respectively show the precision, recall, and F1-score for each of the 19 diagnostic classes. Scores are reported in percentage. Note that Class 7 was excluded from evaluation due to the absence of positive samples in this subset.
  • Figure 4: Per-class performance comparison of four baseline models (ResNet-1D, BiLSTM, Mamba 2, and Transformer) on the 12-lead ECG subset. Panels (a), (b), and (c) respectively show the precision, recall, and F1-score for each of the 19 diagnostic classes. Scores are reported in percentage.
  • Figure 5: Training dynamics of four baseline models (ResNet-1D, BiLSTM, Mamba 2, and Transformer) on the pediatric ECG multi-label CVD classification task. Panels (a) and (b) show results on the 9-lead and 12-lead subsets, respectively. Solid lines denote macro F1 scores on validation dataset.