Table of Contents
Fetching ...

Signal and Noise Classification in Bio-Signals via unsupervised Machine Learning

Sansrit Paudel

TL;DR

This work tackles noise contamination in biosignals (ECG and PPG) by applying unsupervised clustering to distinguish clean from noisy segments and to classify noise types. The methodology combines controlled data collection with time- and frequency-domain feature extraction, PCA-driven dimensionality reduction, and K-means/agglomerative clustering, evaluated against ground-truth labels using standard metrics. Findings show strong ability to preserve clean segments but limited discrimination among noise categories, highlighting the need for richer features and potentially more clusters; a binary clean-versus-noise detector performs robustly for preprocessing. The study contributes a reproducible workflow and open-source code/data, offering a practical preprocessing step to improve biosignal quality for downstream analyses.

Abstract

Real-world biosignal data is frequently corrupted by various types of noise, such as motion artifacts, and baseline wander. Although digital signal processing techniques exist to process such signals; however, heavily degraded signals cannot be recovered. In this study, we aim to classify two things: first, a binary classification of noisy and clean biosignals, and next, to categorize various kinds of noise such as motion artifacts, sensor failure, etc. We implemented K-means clustering, and our results indicate that the algorithm can most reliably group clean segments from noisy ones, particularly strong performance in identifying clean data compared to various categories of noise. This approach enables the selection of only high-quality bio-signal segments and provides accurate results for feature engineering that may enhance the precision of machine learning models trained on biosignals.

Signal and Noise Classification in Bio-Signals via unsupervised Machine Learning

TL;DR

This work tackles noise contamination in biosignals (ECG and PPG) by applying unsupervised clustering to distinguish clean from noisy segments and to classify noise types. The methodology combines controlled data collection with time- and frequency-domain feature extraction, PCA-driven dimensionality reduction, and K-means/agglomerative clustering, evaluated against ground-truth labels using standard metrics. Findings show strong ability to preserve clean segments but limited discrimination among noise categories, highlighting the need for richer features and potentially more clusters; a binary clean-versus-noise detector performs robustly for preprocessing. The study contributes a reproducible workflow and open-source code/data, offering a practical preprocessing step to improve biosignal quality for downstream analyses.

Abstract

Real-world biosignal data is frequently corrupted by various types of noise, such as motion artifacts, and baseline wander. Although digital signal processing techniques exist to process such signals; however, heavily degraded signals cannot be recovered. In this study, we aim to classify two things: first, a binary classification of noisy and clean biosignals, and next, to categorize various kinds of noise such as motion artifacts, sensor failure, etc. We implemented K-means clustering, and our results indicate that the algorithm can most reliably group clean segments from noisy ones, particularly strong performance in identifying clean data compared to various categories of noise. This approach enables the selection of only high-quality bio-signal segments and provides accurate results for feature engineering that may enhance the precision of machine learning models trained on biosignals.

Paper Structure

This paper contains 17 sections, 22 figures, 1 table.

Figures (22)

  • Figure 1: Physiological data collection protocol
  • Figure 2: Proposed System Flow
  • Figure 3: Silhouette score across different values of k for identifying the optimal cluster
  • Figure 4: PCA: Dimensionality reduction using PCA
  • Figure 5: Confusion matrix for ECG noise-type clustering.
  • ...and 17 more figures