Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research

Chenda Duan; Yipeng Zhang; Sotaro Kanai; Yuanyi Ding; Atsuro Daida; Pengyue Yu; Tiancheng Zheng; Naoto Kuroda; Shaun A. Hussain; Eishi Asano; Hiroki Nariai; Vwani Roychowdhury

Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research

Chenda Duan, Yipeng Zhang, Sotaro Kanai, Yuanyi Ding, Atsuro Daida, Pengyue Yu, Tiancheng Zheng, Naoto Kuroda, Shaun A. Hussain, Eishi Asano, Hiroki Nariai, Vwani Roychowdhury

TL;DR

Omni-iEEG addresses the reproducibility and generalizability gaps in iEEG epilepsy research by introducing a large, harmonized dataset from eight centers (302 patients, 178 hours) with over 36K expert annotations and standardized metadata. It defines two clinically meaningful benchmark tasks—Clinical Prior-Driven Pathological Events Classification and Pathological Brain Region Identification—plus three exploratory tasks, and evaluates a diverse set of baselines ranging from biomarker-driven to end-to-end deep learning, including cross-domain transfer from audio models. Key findings show that end-to-end segment models like TimeConv-CNN can match or surpass biomarker-based approaches for identifying pathological tissue and predicting surgical outcomes, while cross-domain representations from audio models offer surprising transferability to iEEG tasks. The work provides a practical, clinically relevant benchmark that facilitates reproducible, cross-center epilepsy research and opens avenues for novel biomarkers and interpretable AI in surgical planning.

Abstract

Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. The dataset includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, all validated by board-certified epileptologists. In addition, Omni-iEEG provides over 36K expert-validated annotations of pathological events, enabling robust biomarker studies. Omni-iEEG serves as a bridge between machine learning and epilepsy research. It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings. Beyond benchmarking, we demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. Together, these contributions establish Omni-iEEG as a foundation for reproducible, generalizable, and clinically translatable epilepsy research. The project page with dataset and code links is available at omni-ieeg.github.io/omni-ieeg.

Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research

TL;DR

Abstract

, a large-scale, pre-surgical iEEG resource comprising

and

of high-resolution recordings. The dataset includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, all validated by board-certified epileptologists. In addition, Omni-iEEG provides over 36K expert-validated annotations of pathological events, enabling robust biomarker studies. Omni-iEEG serves as a bridge between machine learning and epilepsy research. It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings. Beyond benchmarking, we demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. Together, these contributions establish Omni-iEEG as a foundation for reproducible, generalizable, and clinically translatable epilepsy research. The project page with dataset and code links is available at omni-ieeg.github.io/omni-ieeg.

Paper Structure (44 sections, 4 figures, 13 tables)

This paper contains 44 sections, 4 figures, 13 tables.

Introduction
Related Work
Omni-iEEG Dataset
Benchmark Tasks for Interictal iEEG Analysis
Task 1: Clinical Prior-Driven Pathological Events Classification
Task 2: Pathological Brain Region Identification
Motivation and Clinical Background
Task Specification and Evaluation Metrics
Exploratory Tasks
Benchmark Results
Pathological Events Classification
Pathological Brain Region Identification
Beyond Benchmarking: Translational Insights from Omni-iEEG
Conclusion and Limitations
The Use of Large Language Models
...and 29 more sections

Figures (4)

Figure 1: Overview of the Omni-iEEG Dataset and Benchmark.
Figure 2: Event-based iEEG input: raw waveform (left) and time-frequency representation (right).
Figure 3: Channel Segment-based iEEG input: raw waveform (left) and time-frequency representation (right).
Figure 4: Model interpretation analysis using SHAP.

Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research

TL;DR

Abstract

Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research

Authors

TL;DR

Abstract

Table of Contents

Figures (4)