A Systematic Review of ECG Arrhythmia Classification: Adherence to Standards, Fair Evaluation, and Embedded Feasibility
Guilherme Silva, Pedro Silva, Gladston Moreira, Vander Freitas, Jadson Gertrudes, Eduardo Luz
TL;DR
This systematic review examines ECG arrhythmia classification literature from 2017 to 2024 through the E3C lens (Embedded, Clinical, Comparative Criteria). It reveals that only a small fraction of studies adhere simultaneously to AAMI guidelines, employ true inter-patient partitioning, and assess embedded feasibility, highlighting a gap between high accuracy and real-world deployment. Among the few that do meet E3C, Farag 2023 achieves top core-class performance with ultra-low latency, while Mao 2022 uniquely demonstrates on-device learning for patient-specific adaptation. The work advocates standardized reporting and benchmarking frameworks to enable fair comparisons and drive the development of clinically viable, resource-efficient ECG classification models for wearable and implanted systems.
Abstract
The classification of electrocardiogram (ECG) signals is crucial for early detection of arrhythmias and other cardiac conditions. However, despite advances in machine learning, many studies fail to follow standardization protocols, leading to inconsistencies in performance evaluation and real-world applicability. Additionally, hardware constraints essential for practical deployment, such as in pacemakers, Holter monitors, and wearable ECG patches, are often overlooked. Since real-world impact depends on feasibility in resource-constrained devices, ensuring efficient deployment is critical for continuous monitoring. This review systematically analyzes ECG classification studies published between 2017 and 2024, focusing on those adhering to the E3C (Embedded, Clinical, and Comparative Criteria), which include inter-patient paradigm implementation, compliance with Association for the Advancement of Medical Instrumentation (AAMI) recommendations, and model feasibility for embedded systems. While many studies report high accuracy, few properly consider patient-independent partitioning and hardware limitations. We identify state-of-the-art methods meeting E3C criteria and conduct a comparative analysis of accuracy, inference time, energy consumption, and memory usage. Finally, we propose standardized reporting practices to ensure fair comparisons and practical applicability of ECG classification models. By addressing these gaps, this study aims to guide future research toward more robust and clinically viable ECG classification systems.
