Table of Contents
Fetching ...

Affordable EEG, Actionable Insights: An Open Dataset and Evaluation Framework for Epilepsy Patient Stratification

HM Shadman Tabib, Md. Hasnaen Adil, Ayesha Rahman, Ahmmad Nur Swapnil, Maoyejatun Hasana, Ahmed Hossain Chowdhury, A. B. M. Alim Al Islam

TL;DR

NEUROSKY--EPI provides the first open, single-channel EEG dataset collected in a low-resource setting (25 patients, 2,032 one-second windows from a $Fp1$ forehead electrode) and introduces EmbedCluster, an unsupervised patient-stratification framework that leverages both clinically grounded EEGNet embeddings (pretrained on CHB-MIT) and context-aware autoencoder embeddings trained on NEUROSKY--EPI. The study demonstrates that affordable, consumer-grade EEG can support meaningful patient stratification, achieving about $62.5\%$ clustering accuracy across multiple algorithms, which exceeds chance ($50\%$), and discusses the trade-offs between transfer-learned and locally trained representations. It foregrounds human-centered and ethical considerations—deployability in resource-constrained settings, interpretability for non-specialists, privacy, bias, and inclusivity—while releasing data and code to catalyze interdisciplinary work in health technology, HCI, and ML for equitable epilepsy care. Overall, the work points to a practical path toward scalable, accessible EEG-based stratification that can inform referral, monitoring, and treatment in settings where traditional clinical EEG is unavailable."

Abstract

Access to clinical multi-channel EEG remains limited in many regions worldwide. We present NEUROSKY-EPI, the first open dataset of single-channel, consumer-grade EEG for epilepsy, collected in a South Asian clinical setting along with rich contextual metadata. To explore its utility, we introduce EmbedCluster, a patient-stratification pipeline that transfers representations from EEGNet models trained on clinical data and enriches them with contextual autoencoder embeddings, followed by unsupervised clustering of patients based on EEG patterns. Results show that low-cost, single-channel data can support meaningful stratification. Beyond algorithmic performance, we emphasize human-centered concerns such as deployability in resource-constrained environments, interpretability for non-specialists, and safeguards for privacy, inclusivity, and bias. By releasing the dataset and code, we aim to catalyze interdisciplinary research across health technology, human-computer interaction, and machine learning, advancing the goal of affordable and actionable EEG-based epilepsy care.

Affordable EEG, Actionable Insights: An Open Dataset and Evaluation Framework for Epilepsy Patient Stratification

TL;DR

NEUROSKY--EPI provides the first open, single-channel EEG dataset collected in a low-resource setting (25 patients, 2,032 one-second windows from a forehead electrode) and introduces EmbedCluster, an unsupervised patient-stratification framework that leverages both clinically grounded EEGNet embeddings (pretrained on CHB-MIT) and context-aware autoencoder embeddings trained on NEUROSKY--EPI. The study demonstrates that affordable, consumer-grade EEG can support meaningful patient stratification, achieving about clustering accuracy across multiple algorithms, which exceeds chance (), and discusses the trade-offs between transfer-learned and locally trained representations. It foregrounds human-centered and ethical considerations—deployability in resource-constrained settings, interpretability for non-specialists, privacy, bias, and inclusivity—while releasing data and code to catalyze interdisciplinary work in health technology, HCI, and ML for equitable epilepsy care. Overall, the work points to a practical path toward scalable, accessible EEG-based stratification that can inform referral, monitoring, and treatment in settings where traditional clinical EEG is unavailable."

Abstract

Access to clinical multi-channel EEG remains limited in many regions worldwide. We present NEUROSKY-EPI, the first open dataset of single-channel, consumer-grade EEG for epilepsy, collected in a South Asian clinical setting along with rich contextual metadata. To explore its utility, we introduce EmbedCluster, a patient-stratification pipeline that transfers representations from EEGNet models trained on clinical data and enriches them with contextual autoencoder embeddings, followed by unsupervised clustering of patients based on EEG patterns. Results show that low-cost, single-channel data can support meaningful stratification. Beyond algorithmic performance, we emphasize human-centered concerns such as deployability in resource-constrained environments, interpretability for non-specialists, and safeguards for privacy, inclusivity, and bias. By releasing the dataset and code, we aim to catalyze interdisciplinary research across health technology, human-computer interaction, and machine learning, advancing the goal of affordable and actionable EEG-based epilepsy care.

Paper Structure

This paper contains 24 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: NEUROSKY--EPI data collection. (a) Process overview: after consent and survey, a single-channel consumer EEG is set up. Each participant completes two controlled conditions—resting/eyes-closed (60s) and awake/eyes-open (60s). Sessions are segmented into 1-second windows and paired with clinical metadata. (b) Photograph of a data collection session (faces blurred).
  • Figure 2: Overview of our transfer learning approach (Stage 1 training). We train a seizure detection model on CHB-MIT, restricted to a single frontal channel to mirror the consumer headset. Raw 8-second EEG segments are converted into 10-dimensional band-power features to match the NeuroSky output. A generator network upsamples these features into a synthetic EEG sequence, which is processed by EEGNet, a compact CNN. EEGNet learns to classify seizure vs. non-seizure segments. This stage yields a calibrated model whose embeddings transfer to our consumer EEG domain.
  • Figure 3: Stage 2: EEGNet embedding extraction for unsupervised stratification. EEGNet embeddings are aggregated at the patient level (mean + std) and clustered. Clusters align with seizure frequency change annotations (Yes/No), achieving 62.50% accuracy across algorithms.
  • Figure 4: Alternative approach: Autoencoder-based stratification. Contextual autoencoder embeddings were aggregated per patient and clustered. Accuracy ranged from 58.33% to 62.50%, with GMM best matching seizure frequency change annotations. This self-contained method avoids reliance on external data, highlighting its deployability in local contexts.
  • Figure 5: Comparison of EEGNet approach vs Autoencoder