Table of Contents
Fetching ...

Two-Stage Hierarchical and Explainable Feature Selection Framework for Dimensionality Reduction in Sleep Staging

Yangfan Deng, Hamad Albidah, Ahmed Dallal, Jijun Yin, Zhi-Hong Mao

TL;DR

The paper tackles the challenge of visualizing and classifying high-dimensional EEG data for sleep staging. It introduces a two-stage hierarchical and explainable feature selection framework that combines RFECV-based feature pruning with three dimensionality reduction methods, augmented by persistent-homology topological features derived from Takens-embedded EEG data. Empirical results on Sleep-EDF show that spectral-temporal features complemented by topological features improve classification, with t-SNE achieving the highest accuracy (79.8%), while UMAP provides a practical balance of performance and efficiency. The work advances explainable analysis of sleep-related EEG patterns by integrating topology-driven structure information with traditional features and a transparent feature-selection process.

Abstract

Sleep is crucial for human health, and EEG signals play a significant role in sleep research. Due to the high-dimensional nature of EEG signal data sequences, data visualization and clustering of different sleep stages have been challenges. To address these issues, we propose a two-stage hierarchical and explainable feature selection framework by incorporating a feature selection algorithm to improve the performance of dimensionality reduction. Inspired by topological data analysis, which can analyze the structure of high-dimensional data, we extract topological features from the EEG signals to compensate for the structural information loss that happens in traditional spectro-temporal data analysis. Supported by the topological visualization of the data from different sleep stages and the classification results, the proposed features are proven to be effective supplements to traditional features. Finally, we compare the performances of three dimensionality reduction algorithms: Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). Among them, t-SNE achieved the highest accuracy of 79.8%, but considering the overall performance in terms of computational resources and metrics, UMAP is the optimal choice.

Two-Stage Hierarchical and Explainable Feature Selection Framework for Dimensionality Reduction in Sleep Staging

TL;DR

The paper tackles the challenge of visualizing and classifying high-dimensional EEG data for sleep staging. It introduces a two-stage hierarchical and explainable feature selection framework that combines RFECV-based feature pruning with three dimensionality reduction methods, augmented by persistent-homology topological features derived from Takens-embedded EEG data. Empirical results on Sleep-EDF show that spectral-temporal features complemented by topological features improve classification, with t-SNE achieving the highest accuracy (79.8%), while UMAP provides a practical balance of performance and efficiency. The work advances explainable analysis of sleep-related EEG patterns by integrating topology-driven structure information with traditional features and a transparent feature-selection process.

Abstract

Sleep is crucial for human health, and EEG signals play a significant role in sleep research. Due to the high-dimensional nature of EEG signal data sequences, data visualization and clustering of different sleep stages have been challenges. To address these issues, we propose a two-stage hierarchical and explainable feature selection framework by incorporating a feature selection algorithm to improve the performance of dimensionality reduction. Inspired by topological data analysis, which can analyze the structure of high-dimensional data, we extract topological features from the EEG signals to compensate for the structural information loss that happens in traditional spectro-temporal data analysis. Supported by the topological visualization of the data from different sleep stages and the classification results, the proposed features are proven to be effective supplements to traditional features. Finally, we compare the performances of three dimensionality reduction algorithms: Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). Among them, t-SNE achieved the highest accuracy of 79.8%, but considering the overall performance in terms of computational resources and metrics, UMAP is the optimal choice.
Paper Structure (24 sections, 7 equations, 7 figures, 2 tables)

This paper contains 24 sections, 7 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Manifold distance and Euclidean space.
  • Figure 2: Examples of zero-dimensional (a), one-dimensional (b), two-dimensional (c), and three-dimensional simplices (d).
  • Figure 3: Example of a VR complex. (a) Data points in $X$. (b) Each point attached with a ball of radius $\epsilon$. (c) Clusters in the VR complex $V(X, \epsilon)$.
  • Figure 4: The process of Rips complex filtration. Subfigures (a)-(g) are schematic diagrams at different radii of the data points, where overlapping regions will have a deeper color. Subfigure (f) is the persistent diagram of the data points in subfigure (a) as their radius increases from $0$ to $\infty$.
  • Figure 5: Two-stage hierarchical feature selection framework for dimensionality reduction.
  • ...and 2 more figures