Two-Stage Hierarchical and Explainable Feature Selection Framework for Dimensionality Reduction in Sleep Staging
Yangfan Deng, Hamad Albidah, Ahmed Dallal, Jijun Yin, Zhi-Hong Mao
TL;DR
The paper tackles the challenge of visualizing and classifying high-dimensional EEG data for sleep staging. It introduces a two-stage hierarchical and explainable feature selection framework that combines RFECV-based feature pruning with three dimensionality reduction methods, augmented by persistent-homology topological features derived from Takens-embedded EEG data. Empirical results on Sleep-EDF show that spectral-temporal features complemented by topological features improve classification, with t-SNE achieving the highest accuracy (79.8%), while UMAP provides a practical balance of performance and efficiency. The work advances explainable analysis of sleep-related EEG patterns by integrating topology-driven structure information with traditional features and a transparent feature-selection process.
Abstract
Sleep is crucial for human health, and EEG signals play a significant role in sleep research. Due to the high-dimensional nature of EEG signal data sequences, data visualization and clustering of different sleep stages have been challenges. To address these issues, we propose a two-stage hierarchical and explainable feature selection framework by incorporating a feature selection algorithm to improve the performance of dimensionality reduction. Inspired by topological data analysis, which can analyze the structure of high-dimensional data, we extract topological features from the EEG signals to compensate for the structural information loss that happens in traditional spectro-temporal data analysis. Supported by the topological visualization of the data from different sleep stages and the classification results, the proposed features are proven to be effective supplements to traditional features. Finally, we compare the performances of three dimensionality reduction algorithms: Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). Among them, t-SNE achieved the highest accuracy of 79.8%, but considering the overall performance in terms of computational resources and metrics, UMAP is the optimal choice.
