Signal and Noise Classification in Bio-Signals via unsupervised Machine Learning
Sansrit Paudel
TL;DR
This work tackles noise contamination in biosignals (ECG and PPG) by applying unsupervised clustering to distinguish clean from noisy segments and to classify noise types. The methodology combines controlled data collection with time- and frequency-domain feature extraction, PCA-driven dimensionality reduction, and K-means/agglomerative clustering, evaluated against ground-truth labels using standard metrics. Findings show strong ability to preserve clean segments but limited discrimination among noise categories, highlighting the need for richer features and potentially more clusters; a binary clean-versus-noise detector performs robustly for preprocessing. The study contributes a reproducible workflow and open-source code/data, offering a practical preprocessing step to improve biosignal quality for downstream analyses.
Abstract
Real-world biosignal data is frequently corrupted by various types of noise, such as motion artifacts, and baseline wander. Although digital signal processing techniques exist to process such signals; however, heavily degraded signals cannot be recovered. In this study, we aim to classify two things: first, a binary classification of noisy and clean biosignals, and next, to categorize various kinds of noise such as motion artifacts, sensor failure, etc. We implemented K-means clustering, and our results indicate that the algorithm can most reliably group clean segments from noisy ones, particularly strong performance in identifying clean data compared to various categories of noise. This approach enables the selection of only high-quality bio-signal segments and provides accurate results for feature engineering that may enhance the precision of machine learning models trained on biosignals.
