Table of Contents
Fetching ...

Exploring Best Practices for ECG Pre-Processing in Machine Learning

Amir Salimi, Sunil Vasu Kalmady, Abram Hindle, Osmar Zaiane, Padma Kaul

TL;DR

The paper tackles the lack of consensus on ECG pre-processing for machine learning by evaluating down-sampling, band-passing, and normalization across three large multi-label ECG datasets and three state-of-the-art time-series models. It demonstrates that reducing sampling rates to as low as 50 Hz can maintain comparable performance while substantially reducing training time and hardware needs, and that min-max normalization often harms performance while band-pass filtering yields no measurable gains. The findings emphasize that there is no one-size-fits-all pre-processing strategy; effectiveness depends on the label and model, underlining the value of automated or adaptive pre-processing approaches for ECG classification. The work provides practical guidance for researchers seeking resource-efficient training without sacrificing accuracy and calls for further development of automated pre-processing pipelines in time-series domains.

Abstract

In this work we search for best practices in pre-processing of Electrocardiogram (ECG) signals in order to train better classifiers for the diagnosis of heart conditions. State of the art machine learning algorithms have achieved remarkable results in classification of some heart conditions using ECG data, yet there appears to be no consensus on pre-processing best practices. Is this lack of consensus due to different conditions and architectures requiring different processing steps for optimal performance? Is it possible that state of the art deep-learning models have rendered pre-processing unnecessary? In this work we apply down-sampling, normalization, and filtering functions to 3 different multi-label ECG datasets and measure their effects on 3 different high-performing time-series classifiers. We find that sampling rates as low as 50Hz can yield comparable results to the commonly used 500Hz. This is significant as smaller sampling rates will result in smaller datasets and models, which require less time and resources to train. Additionally, despite their common usage, we found min-max normalization to be slightly detrimental overall, and band-passing to make no measurable difference. We found the blind approach to pre-processing of ECGs for multi-label classification to be ineffective, with the exception of sample rate reduction which reliably reduces computational resources, but does not increase accuracy.

Exploring Best Practices for ECG Pre-Processing in Machine Learning

TL;DR

The paper tackles the lack of consensus on ECG pre-processing for machine learning by evaluating down-sampling, band-passing, and normalization across three large multi-label ECG datasets and three state-of-the-art time-series models. It demonstrates that reducing sampling rates to as low as 50 Hz can maintain comparable performance while substantially reducing training time and hardware needs, and that min-max normalization often harms performance while band-pass filtering yields no measurable gains. The findings emphasize that there is no one-size-fits-all pre-processing strategy; effectiveness depends on the label and model, underlining the value of automated or adaptive pre-processing approaches for ECG classification. The work provides practical guidance for researchers seeking resource-efficient training without sacrificing accuracy and calls for further development of automated pre-processing pipelines in time-series domains.

Abstract

In this work we search for best practices in pre-processing of Electrocardiogram (ECG) signals in order to train better classifiers for the diagnosis of heart conditions. State of the art machine learning algorithms have achieved remarkable results in classification of some heart conditions using ECG data, yet there appears to be no consensus on pre-processing best practices. Is this lack of consensus due to different conditions and architectures requiring different processing steps for optimal performance? Is it possible that state of the art deep-learning models have rendered pre-processing unnecessary? In this work we apply down-sampling, normalization, and filtering functions to 3 different multi-label ECG datasets and measure their effects on 3 different high-performing time-series classifiers. We find that sampling rates as low as 50Hz can yield comparable results to the commonly used 500Hz. This is significant as smaller sampling rates will result in smaller datasets and models, which require less time and resources to train. Additionally, despite their common usage, we found min-max normalization to be slightly detrimental overall, and band-passing to make no measurable difference. We found the blind approach to pre-processing of ECGs for multi-label classification to be ineffective, with the exception of sample rate reduction which reliably reduces computational resources, but does not increase accuracy.
Paper Structure (19 sections, 2 equations, 3 figures, 8 tables)

This paper contains 19 sections, 2 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Performance of the models for each disease at varying scaling rates. The 15 models are ranked against each other, higher MRR means better relative performance
  • Figure 2: Performance of the models for each disease at varying HighPass frequencies. Higher MRR means better relative performance
  • Figure 3: Performance of the models for each disease using different normalization types. Higher MRR means better relative performance