A Benchmark Study of Deep Learning Methods for Multi-Label Pediatric Electrocardiogram-Based Cardiovascular Disease Classification

Yiqiao Chen

A Benchmark Study of Deep Learning Methods for Multi-Label Pediatric Electrocardiogram-Based Cardiovascular Disease Classification

Yiqiao Chen

TL;DR

The study addresses the lack of benchmarks for multi-label pediatric ECG-based CVD classification by evaluating four deep learning paradigms (ResNet-1D, BiLSTM, Transformer, Mamba 2) on the ZZU-pECG dataset, across 9-lead and 12-lead configurations for 19 pediatric CVD labels. It establishes a standardized processing and evaluation framework, including 3-second slices, down-sampling, stratified splits, and a weighted binary cross-entropy loss to handle class imbalance. The results show high performance across models (e.g., macro-F1 above 85% and Hamming loss near zero), with ResNet-1D achieving a macro-F1 of 94.67% on 12-lead, and Transformer showing gains in larger, more information-rich settings; rare diseases remain challenging, especially in the 9-lead subset. The work highlights the need for larger, multi-center validation, age-stratified analyses, broader disease coverage, and cross-subject generalization strategies to enable real-world pediatric ECG deployment.

Abstract

Cardiovascular disease (CVD) is a major pediatric health burden, and early screening is of critical importance. Electrocardiography (ECG), as a noninvasive and accessible tool, is well suited for this purpose. This paper presents the first benchmark study of deep learning for multi-label pediatric CVD classification on the recently released ZZU-pECG dataset, comprising 3716 recordings with 19 CVD categories. We systematically evaluate four representative paradigms--ResNet-1D, BiLSTM, Transformer, and Mamba 2--under both 9-lead and 12-lead configurations. All models achieved strong results, with Hamming Loss as low as 0.0069 and F1-scores above 85% in most settings. ResNet-1D reached a macro-F1 of 94.67% on the 12-lead subset, while BiLSTM and Transformer also showed competitive performance. Per-class analysis indicated challenges for rare conditions such as hypertrophic cardiomyopathy in the 9-lead subset, reflecting the effect of limited positive samples. This benchmark establishes reusable baselines and highlights complementary strengths across paradigms. It further points to the need for larger-scale, multi-center validation, age-stratified analysis, and broader disease coverage to support real-world pediatric ECG applications.

A Benchmark Study of Deep Learning Methods for Multi-Label Pediatric Electrocardiogram-Based Cardiovascular Disease Classification

TL;DR

Abstract

A Benchmark Study of Deep Learning Methods for Multi-Label Pediatric Electrocardiogram-Based Cardiovascular Disease Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)