Table of Contents
Fetching ...

ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice

Stefan Kraft, Andreas Theissler, Vera Wienhausen-Wilke, Philipp Walter, Gjergji Kasneci

TL;DR

A novel post-processing and evaluation framework emphasizing approximate localization and precise event count (ALPEC) of arousals and a novel comprehensive polysomnographic dataset (CPS) that reflects the aforementioned clinical annotation constraints and includes modalities not present in existing polysomnographic datasets.

Abstract

Detecting arousals in sleep is essential for diagnosing sleep disorders. However, using Machine Learning (ML) in clinical practice is impeded by fundamental issues, primarily due to mismatches between clinical protocols and ML methods. Clinicians typically annotate only the onset of arousals, while ML methods rely on annotations for both the beginning and end. Additionally, there is no standardized evaluation methodology tailored to clinical needs for arousal detection models. This work addresses these issues by introducing a novel post-processing and evaluation framework emphasizing approximate localization and precise event count (ALPEC) of arousals. We recommend that ML practitioners focus on detecting arousal onsets, aligning with clinical practice. We examine the impact of this shift on current training and evaluation schemes, addressing simplifications and challenges. We utilize a novel comprehensive polysomnographic dataset (CPS) that reflects the aforementioned clinical annotation constraints and includes modalities not present in existing polysomnographic datasets. We release the dataset alongside this paper, demonstrating the benefits of leveraging multimodal data for arousal onset detection. Our findings significantly contribute to integrating ML-based arousal detection in clinical settings, reducing the gap between technological advancements and clinical needs.

ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice

TL;DR

A novel post-processing and evaluation framework emphasizing approximate localization and precise event count (ALPEC) of arousals and a novel comprehensive polysomnographic dataset (CPS) that reflects the aforementioned clinical annotation constraints and includes modalities not present in existing polysomnographic datasets.

Abstract

Detecting arousals in sleep is essential for diagnosing sleep disorders. However, using Machine Learning (ML) in clinical practice is impeded by fundamental issues, primarily due to mismatches between clinical protocols and ML methods. Clinicians typically annotate only the onset of arousals, while ML methods rely on annotations for both the beginning and end. Additionally, there is no standardized evaluation methodology tailored to clinical needs for arousal detection models. This work addresses these issues by introducing a novel post-processing and evaluation framework emphasizing approximate localization and precise event count (ALPEC) of arousals. We recommend that ML practitioners focus on detecting arousal onsets, aligning with clinical practice. We examine the impact of this shift on current training and evaluation schemes, addressing simplifications and challenges. We utilize a novel comprehensive polysomnographic dataset (CPS) that reflects the aforementioned clinical annotation constraints and includes modalities not present in existing polysomnographic datasets. We release the dataset alongside this paper, demonstrating the benefits of leveraging multimodal data for arousal onset detection. Our findings significantly contribute to integrating ML-based arousal detection in clinical settings, reducing the gap between technological advancements and clinical needs.
Paper Structure (45 sections, 11 equations, 1 figure, 15 tables, 1 algorithm)

This paper contains 45 sections, 11 equations, 1 figure, 15 tables, 1 algorithm.

Figures (1)

  • Figure 1: Schematic illustration of different approaches for training and evaluating arousal detection models. In schemas S1, S2, S3, and S7, lines and areas in green color represent target points of the positive class (arousal) while empty areas in between contain points of the negative class (no arousal). Lines in blue color represent points that are predicted to be in the positive class. For schemas containing pointwise evaluations (S4-S6), all points which are marked in green or blue are considered to be in the positive class while all other points are considered to be in the negative class. Names of training schemes are highlighted in red, evaluation schemes in blue. All sizes and dimensions are for illustrative purposes and not representative. Especially schemas containing pointwise evaluations will contain many more data points inside events/intervals (S4-S5) and between events (S4-S6). For schemas containing window-based approaches (S1-S3), each box represents a window of fixed length containing many data points, where the class identification or evaluation outcome of each point is given by the label on the box.