Affordable EEG, Actionable Insights: An Open Dataset and Evaluation Framework for Epilepsy Patient Stratification
HM Shadman Tabib, Md. Hasnaen Adil, Ayesha Rahman, Ahmmad Nur Swapnil, Maoyejatun Hasana, Ahmed Hossain Chowdhury, A. B. M. Alim Al Islam
TL;DR
NEUROSKY--EPI provides the first open, single-channel EEG dataset collected in a low-resource setting (25 patients, 2,032 one-second windows from a $Fp1$ forehead electrode) and introduces EmbedCluster, an unsupervised patient-stratification framework that leverages both clinically grounded EEGNet embeddings (pretrained on CHB-MIT) and context-aware autoencoder embeddings trained on NEUROSKY--EPI. The study demonstrates that affordable, consumer-grade EEG can support meaningful patient stratification, achieving about $62.5\%$ clustering accuracy across multiple algorithms, which exceeds chance ($50\%$), and discusses the trade-offs between transfer-learned and locally trained representations. It foregrounds human-centered and ethical considerations—deployability in resource-constrained settings, interpretability for non-specialists, privacy, bias, and inclusivity—while releasing data and code to catalyze interdisciplinary work in health technology, HCI, and ML for equitable epilepsy care. Overall, the work points to a practical path toward scalable, accessible EEG-based stratification that can inform referral, monitoring, and treatment in settings where traditional clinical EEG is unavailable."
Abstract
Access to clinical multi-channel EEG remains limited in many regions worldwide. We present NEUROSKY-EPI, the first open dataset of single-channel, consumer-grade EEG for epilepsy, collected in a South Asian clinical setting along with rich contextual metadata. To explore its utility, we introduce EmbedCluster, a patient-stratification pipeline that transfers representations from EEGNet models trained on clinical data and enriches them with contextual autoencoder embeddings, followed by unsupervised clustering of patients based on EEG patterns. Results show that low-cost, single-channel data can support meaningful stratification. Beyond algorithmic performance, we emphasize human-centered concerns such as deployability in resource-constrained environments, interpretability for non-specialists, and safeguards for privacy, inclusivity, and bias. By releasing the dataset and code, we aim to catalyze interdisciplinary research across health technology, human-computer interaction, and machine learning, advancing the goal of affordable and actionable EEG-based epilepsy care.
