Label Propagation Techniques for Artifact Detection in Imbalanced Classes using Photoplethysmogram Signals
Clara Macabiau, Thanh-Dung Le, Kevin Albert, Mana Shahriari, Philippe Jouvet, Rita Noumeir
TL;DR
This work tackles motion artifact detection in pediatric PPG signals under severe class imbalance and limited labeled data by applying label propagation (LP), a graph-based semi-supervised method. It constructs a large pulse-level dataset from 1571 PICU patients, preprocesses signals into 256-length pulses, and uses expert labeling on a small subset to seed LP, which propagates labels through a KNN-graph; oversampling with SMOTE further mitigates imbalance. LP achieves high precision and recall for artifacts (≈0.90) with AUROC ≈0.98, and, when combined with 5% labeled data, outperforms several supervised baselines and neural architectures in this setting. The findings suggest LP is a practical, scalable approach for improving PPG artifact detection in clinical monitoring and could extend to other biomedical signals like ECG or ABP.
Abstract
This study aimed to investigate the application of label propagation techniques to propagate labels among photoplethysmogram (PPG) signals, particularly in imbalanced class scenarios and limited data availability scenarios, where clean PPG samples are significantly outnumbered by artifact-contaminated samples. We investigated a dataset comprising PPG recordings from 1571 patients, wherein approximately 82% of the samples were identified as clean, while the remaining 18% were contaminated by artifacts. Our research compares the performance of supervised classifiers, such as conventional classifiers and neural networks (Multi-Layer Perceptron (MLP), Transformers, Fully Convolutional Network (FCN)), with the semi-supervised Label Propagation (LP) algorithm for artifact classification in PPG signals. The results indicate that the LP algorithm achieves a precision of 91%, a recall of 90%, and an F1 score of 90% for the "artifacts" class, showcasing its effectiveness in annotating a medical dataset, even in cases where clean samples are rare. Although the K-Nearest Neighbors (KNN) supervised model demonstrated good results with a precision of 89%, a recall of 95%, and an F1 score of 92%, the semi-supervised algorithm excels in artifact detection. In the case of imbalanced and limited pediatric intensive care environment data, the semi-supervised LP algorithm is promising for artifact detection in PPG signals. The results of this study are important for improving the accuracy of PPG-based health monitoring, particularly in situations in which motion artifacts pose challenges to data interpretation
