Table of Contents
Fetching ...

A Systematic Review of ECG Arrhythmia Classification: Adherence to Standards, Fair Evaluation, and Embedded Feasibility

Guilherme Silva, Pedro Silva, Gladston Moreira, Vander Freitas, Jadson Gertrudes, Eduardo Luz

TL;DR

This systematic review examines ECG arrhythmia classification literature from 2017 to 2024 through the E3C lens (Embedded, Clinical, Comparative Criteria). It reveals that only a small fraction of studies adhere simultaneously to AAMI guidelines, employ true inter-patient partitioning, and assess embedded feasibility, highlighting a gap between high accuracy and real-world deployment. Among the few that do meet E3C, Farag 2023 achieves top core-class performance with ultra-low latency, while Mao 2022 uniquely demonstrates on-device learning for patient-specific adaptation. The work advocates standardized reporting and benchmarking frameworks to enable fair comparisons and drive the development of clinically viable, resource-efficient ECG classification models for wearable and implanted systems.

Abstract

The classification of electrocardiogram (ECG) signals is crucial for early detection of arrhythmias and other cardiac conditions. However, despite advances in machine learning, many studies fail to follow standardization protocols, leading to inconsistencies in performance evaluation and real-world applicability. Additionally, hardware constraints essential for practical deployment, such as in pacemakers, Holter monitors, and wearable ECG patches, are often overlooked. Since real-world impact depends on feasibility in resource-constrained devices, ensuring efficient deployment is critical for continuous monitoring. This review systematically analyzes ECG classification studies published between 2017 and 2024, focusing on those adhering to the E3C (Embedded, Clinical, and Comparative Criteria), which include inter-patient paradigm implementation, compliance with Association for the Advancement of Medical Instrumentation (AAMI) recommendations, and model feasibility for embedded systems. While many studies report high accuracy, few properly consider patient-independent partitioning and hardware limitations. We identify state-of-the-art methods meeting E3C criteria and conduct a comparative analysis of accuracy, inference time, energy consumption, and memory usage. Finally, we propose standardized reporting practices to ensure fair comparisons and practical applicability of ECG classification models. By addressing these gaps, this study aims to guide future research toward more robust and clinically viable ECG classification systems.

A Systematic Review of ECG Arrhythmia Classification: Adherence to Standards, Fair Evaluation, and Embedded Feasibility

TL;DR

This systematic review examines ECG arrhythmia classification literature from 2017 to 2024 through the E3C lens (Embedded, Clinical, Comparative Criteria). It reveals that only a small fraction of studies adhere simultaneously to AAMI guidelines, employ true inter-patient partitioning, and assess embedded feasibility, highlighting a gap between high accuracy and real-world deployment. Among the few that do meet E3C, Farag 2023 achieves top core-class performance with ultra-low latency, while Mao 2022 uniquely demonstrates on-device learning for patient-specific adaptation. The work advocates standardized reporting and benchmarking frameworks to enable fair comparisons and drive the development of clinically viable, resource-efficient ECG classification models for wearable and implanted systems.

Abstract

The classification of electrocardiogram (ECG) signals is crucial for early detection of arrhythmias and other cardiac conditions. However, despite advances in machine learning, many studies fail to follow standardization protocols, leading to inconsistencies in performance evaluation and real-world applicability. Additionally, hardware constraints essential for practical deployment, such as in pacemakers, Holter monitors, and wearable ECG patches, are often overlooked. Since real-world impact depends on feasibility in resource-constrained devices, ensuring efficient deployment is critical for continuous monitoring. This review systematically analyzes ECG classification studies published between 2017 and 2024, focusing on those adhering to the E3C (Embedded, Clinical, and Comparative Criteria), which include inter-patient paradigm implementation, compliance with Association for the Advancement of Medical Instrumentation (AAMI) recommendations, and model feasibility for embedded systems. While many studies report high accuracy, few properly consider patient-independent partitioning and hardware limitations. We identify state-of-the-art methods meeting E3C criteria and conduct a comparative analysis of accuracy, inference time, energy consumption, and memory usage. Finally, we propose standardized reporting practices to ensure fair comparisons and practical applicability of ECG classification models. By addressing these gaps, this study aims to guide future research toward more robust and clinically viable ECG classification systems.

Paper Structure

This paper contains 19 sections, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Flowchart of the article selection process. The systematic review followed a multi-stage selection process to refine the dataset and ensure the inclusion of relevant, high-quality studies. First, the Science Citation Index Expanded (SCI-EXPANDED) was chosen within Web of Science, reducing the dataset to 927 articles by excluding 500 papers from other indexes. Next, a title and abstract screening removed 274 irrelevant studies, leaving 653 articles. A quality review further excluded 32 articles due to duplication, non-English content, unsuitable formats, or retraction. The final selection was based on two criteria: citation-based filtering, which identified 100 highly cited articles, and feasibility-based filtering, which added 22 studies focused on embedded implementation. This process resulted in a final dataset of 122 articles for analysis.
  • Figure 2: Annotation example from the MIT-BIH database.
  • Figure 3: Distribution of the 122 selected articles based on adherence to AAMI standards, use of the inter-patient paradigm by De Chazal et al.de_chazal_automatic_2004, utilization of the MIT-BIH database, and consideration of hardware feasibility. The bars represent the number of articles meeting each criterion, along with their respective proportions in the dataset.
  • Figure 4: Citation network of the reviewed studies in ECG arrhythmia classification. Each node represents an individual study, while edges indicate citation relationships. The direction of the arrows shows the citation flow, where the source node (origin of the arrow) is the citing article, and the target node (end of the arrow) is the cited article. The color of the nodes represents the number of times a study has been cited in the literature, with darker shades indicating higher citation counts. The shape of the nodes distinguishes whether the study followed the inter-patient paradigm proposed by De Chazal: round nodes indicate studies that implemented this method, while rectangular nodes represent studies that did not follow this approach.