Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
Chenda Duan, Yipeng Zhang, Sotaro Kanai, Yuanyi Ding, Atsuro Daida, Pengyue Yu, Tiancheng Zheng, Naoto Kuroda, Shaun A. Hussain, Eishi Asano, Hiroki Nariai, Vwani Roychowdhury
TL;DR
Omni-iEEG addresses the reproducibility and generalizability gaps in iEEG epilepsy research by introducing a large, harmonized dataset from eight centers (302 patients, 178 hours) with over 36K expert annotations and standardized metadata. It defines two clinically meaningful benchmark tasks—Clinical Prior-Driven Pathological Events Classification and Pathological Brain Region Identification—plus three exploratory tasks, and evaluates a diverse set of baselines ranging from biomarker-driven to end-to-end deep learning, including cross-domain transfer from audio models. Key findings show that end-to-end segment models like TimeConv-CNN can match or surpass biomarker-based approaches for identifying pathological tissue and predicting surgical outcomes, while cross-domain representations from audio models offer surprising transferability to iEEG tasks. The work provides a practical, clinically relevant benchmark that facilitates reproducible, cross-center epilepsy research and opens avenues for novel biomarkers and interpretable AI in surgical planning.
Abstract
Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. The dataset includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, all validated by board-certified epileptologists. In addition, Omni-iEEG provides over 36K expert-validated annotations of pathological events, enabling robust biomarker studies. Omni-iEEG serves as a bridge between machine learning and epilepsy research. It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings. Beyond benchmarking, we demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. Together, these contributions establish Omni-iEEG as a foundation for reproducible, generalizable, and clinically translatable epilepsy research. The project page with dataset and code links is available at omni-ieeg.github.io/omni-ieeg.
