Table of Contents
Fetching ...

Label Propagation Techniques for Artifact Detection in Imbalanced Classes using Photoplethysmogram Signals

Clara Macabiau, Thanh-Dung Le, Kevin Albert, Mana Shahriari, Philippe Jouvet, Rita Noumeir

TL;DR

This work tackles motion artifact detection in pediatric PPG signals under severe class imbalance and limited labeled data by applying label propagation (LP), a graph-based semi-supervised method. It constructs a large pulse-level dataset from 1571 PICU patients, preprocesses signals into 256-length pulses, and uses expert labeling on a small subset to seed LP, which propagates labels through a KNN-graph; oversampling with SMOTE further mitigates imbalance. LP achieves high precision and recall for artifacts (≈0.90) with AUROC ≈0.98, and, when combined with 5% labeled data, outperforms several supervised baselines and neural architectures in this setting. The findings suggest LP is a practical, scalable approach for improving PPG artifact detection in clinical monitoring and could extend to other biomedical signals like ECG or ABP.

Abstract

This study aimed to investigate the application of label propagation techniques to propagate labels among photoplethysmogram (PPG) signals, particularly in imbalanced class scenarios and limited data availability scenarios, where clean PPG samples are significantly outnumbered by artifact-contaminated samples. We investigated a dataset comprising PPG recordings from 1571 patients, wherein approximately 82% of the samples were identified as clean, while the remaining 18% were contaminated by artifacts. Our research compares the performance of supervised classifiers, such as conventional classifiers and neural networks (Multi-Layer Perceptron (MLP), Transformers, Fully Convolutional Network (FCN)), with the semi-supervised Label Propagation (LP) algorithm for artifact classification in PPG signals. The results indicate that the LP algorithm achieves a precision of 91%, a recall of 90%, and an F1 score of 90% for the "artifacts" class, showcasing its effectiveness in annotating a medical dataset, even in cases where clean samples are rare. Although the K-Nearest Neighbors (KNN) supervised model demonstrated good results with a precision of 89%, a recall of 95%, and an F1 score of 92%, the semi-supervised algorithm excels in artifact detection. In the case of imbalanced and limited pediatric intensive care environment data, the semi-supervised LP algorithm is promising for artifact detection in PPG signals. The results of this study are important for improving the accuracy of PPG-based health monitoring, particularly in situations in which motion artifacts pose challenges to data interpretation

Label Propagation Techniques for Artifact Detection in Imbalanced Classes using Photoplethysmogram Signals

TL;DR

This work tackles motion artifact detection in pediatric PPG signals under severe class imbalance and limited labeled data by applying label propagation (LP), a graph-based semi-supervised method. It constructs a large pulse-level dataset from 1571 PICU patients, preprocesses signals into 256-length pulses, and uses expert labeling on a small subset to seed LP, which propagates labels through a KNN-graph; oversampling with SMOTE further mitigates imbalance. LP achieves high precision and recall for artifacts (≈0.90) with AUROC ≈0.98, and, when combined with 5% labeled data, outperforms several supervised baselines and neural architectures in this setting. The findings suggest LP is a practical, scalable approach for improving PPG artifact detection in clinical monitoring and could extend to other biomedical signals like ECG or ABP.

Abstract

This study aimed to investigate the application of label propagation techniques to propagate labels among photoplethysmogram (PPG) signals, particularly in imbalanced class scenarios and limited data availability scenarios, where clean PPG samples are significantly outnumbered by artifact-contaminated samples. We investigated a dataset comprising PPG recordings from 1571 patients, wherein approximately 82% of the samples were identified as clean, while the remaining 18% were contaminated by artifacts. Our research compares the performance of supervised classifiers, such as conventional classifiers and neural networks (Multi-Layer Perceptron (MLP), Transformers, Fully Convolutional Network (FCN)), with the semi-supervised Label Propagation (LP) algorithm for artifact classification in PPG signals. The results indicate that the LP algorithm achieves a precision of 91%, a recall of 90%, and an F1 score of 90% for the "artifacts" class, showcasing its effectiveness in annotating a medical dataset, even in cases where clean samples are rare. Although the K-Nearest Neighbors (KNN) supervised model demonstrated good results with a precision of 89%, a recall of 95%, and an F1 score of 92%, the semi-supervised algorithm excels in artifact detection. In the case of imbalanced and limited pediatric intensive care environment data, the semi-supervised LP algorithm is promising for artifact detection in PPG signals. The results of this study are important for improving the accuracy of PPG-based health monitoring, particularly in situations in which motion artifacts pose challenges to data interpretation
Paper Structure (15 sections, 13 equations, 7 figures, 6 tables)

This paper contains 15 sections, 13 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Workflow of the proposed method for detecting motion artifacts in PPG signals.
  • Figure 2: Example of a 10s segment of a 30s raw PPG signal in the top image, filtered signal in the middle image, and segmented signal in the bottom image.
  • Figure 3: Example of a 10s segment of a 30s raw PPG signal. Inside the blue box are all the pulses containing motion artifacts.
  • Figure 4: Comparison of label propagation between KNN and LP model with the "3 Bands dataset". (a) 3 initial annotated points (3 classes represented in green, red, and blue) and 178 non-annotated points (b) annotated dataset with KNN (c) with LP. From zhu2002.
  • Figure 5: Confusion matrix and ROC curve for the LP algorithm with a KNN kernel with 7 neighbors, an oversampling method SMOTE, and 5% of the dataset already labeled.
  • ...and 2 more figures