Large EEG-U-Transformer for Time-Step Level Detection Without Pre-Training
Kerui Wu, Ziyue Zhao, Bülent Yener
TL;DR
This work reframes EEG detection as time-step level sequence labeling to eliminate post-processing and heavy pre-training demands. It introduces a lightweight U-shaped encoder–decoder that blends local convolutional features with global context via a Transformer, plus an attention-pooling path for window-level predictions. Across seizure detection, sleep staging, and pathology tasks, the model achieves state-of-the-art performance while delivering exceptional inference speed and cross-subject generalization, notably without any pre-training. The approach demonstrates practical value for real-world clinical deployment and opens avenues for unsupervised pre-training within the same architecture.
Abstract
Electroencephalography (EEG) reflects the brain's functional state, making it a crucial tool for diverse detection applications like seizure detection and sleep stage classification. While deep learning-based approaches have recently shown promise for automated detection, traditional models are often constrained by limited learnable parameters and only achieve modest performance. In contrast, large foundation models showed improved capabilities by scaling up the model size, but required extensive time-consuming pre-training. Moreover, both types of existing methods require complex and redundant post-processing pipelines to convert discrete labels to continuous annotations. In this work, based on the multi-scale nature of EEG events, we propose a simple U-shaped model to efficiently learn representations by capturing both local and global features using convolution and self-attentive modules for sequence-to-sequence modeling. Compared to other window-level classification models, our method directly outputs predictions at the time-step level, eliminating redundant overlapping inferences. Beyond sequence-to-sequence modeling, the architecture naturally extends to window-level classification by incorporating an attention-pooling layer. Such a paradigm shift and model design demonstrated promising efficiency improvement, cross-subject generalization, and state-of-the-art performance in various time-step and window-level classification tasks in the experiment. More impressively, our model showed the capability to be scaled up to the same level as existing large foundation models that have been extensively pre-trained over diverse datasets and outperforms them by solely using the downstream fine-tuning dataset. Our model won 1st place in the 2025 "seizure detection challenge" organized in the International Conference on Artificial Intelligence in Epilepsy and Other Neurological Disorders.
