Table of Contents
Fetching ...

BioSerenity-E1: a self-supervised EEG model for medical applications

Ruggero G. Bettinardi, Mohamed Rahmouni, Ulysse Gimenez

TL;DR

BioSerenity-E1 introduces a self-supervised EEG foundation model built from spectral tokenization via a transformer-based VQ-VAE tokenizer and a Masked Token Predictor to learn rich, transferable features from 4000 hours of clinical EEG. The two-stage pretraining enables robust representations that are then fine-tuned for seizure detection, normal/abnormal EEG classification, and multiclass pathology tasks, achieving state-of-the-art or competitive performance across multiple datasets and metrics. The work demonstrates strong results in low-data regimes, indicating practical value for clinical deployment where labeled data are scarce. Limitations include channel-set generalization and codebook utilization, pointing to future work on diverse data, channel-agnostic inputs, and entropy-based improvements to unleash the full capacity of the learned representations.

Abstract

Electroencephalography (EEG) serves as an essential diagnostic tool in neurology; however, its accurate manual interpretation is a time-intensive process that demands highly specialized expertise, which remains relatively scarce and not consistently accessible. To address these limitations, the implementation of automated pre-screening and analysis systems for EEG data holds considerable promise. Advances in self-supervised learning made it possible to pre-train complex deep learning architectures on large volumes of unlabeled EEG data to learn generalizable representations, that can later be used to enhance performance on multiple tasks while needing less downstream data. In the present paper, we introduce BioSerenity-E1, the first of a family of self-supervised foundation models for clinical EEG applications that combines spectral tokenization with masked prediction to achieve state-of-the-art performance across relevant diagnostic tasks. The two-phase self-supervised pretraining framework initially acquires compressed EEG representations via a transformer-based VQ-VAE architecture designed to reconstruct log-multitaper spectral projections, then implements extensive (70% block) masked token prediction to force the model to learn complex spatiotemporal dependencies in EEG signals. BioSerenity-E1 achieves strong performance across three clinical tasks, either in line or above state-of-the-art methods: seizure detection (AUROC = 0.926, Sensitivity = 0.909), normal/abnormal classification (AUPRC = 0.970 on proprietary data; 0.910 on TUH-Abnormal), and multiclass pathology differentiation on unbalanced data (Weighted F1 = 0.730). The utility of BioSerenity-E1 is further confirmed in low-data regimes scenarios, showing clear improvements in AUPRC (from +2% to 17%) when trained on less than 10% of the available data.

BioSerenity-E1: a self-supervised EEG model for medical applications

TL;DR

BioSerenity-E1 introduces a self-supervised EEG foundation model built from spectral tokenization via a transformer-based VQ-VAE tokenizer and a Masked Token Predictor to learn rich, transferable features from 4000 hours of clinical EEG. The two-stage pretraining enables robust representations that are then fine-tuned for seizure detection, normal/abnormal EEG classification, and multiclass pathology tasks, achieving state-of-the-art or competitive performance across multiple datasets and metrics. The work demonstrates strong results in low-data regimes, indicating practical value for clinical deployment where labeled data are scarce. Limitations include channel-set generalization and codebook utilization, pointing to future work on diverse data, channel-agnostic inputs, and entropy-based improvements to unleash the full capacity of the learned representations.

Abstract

Electroencephalography (EEG) serves as an essential diagnostic tool in neurology; however, its accurate manual interpretation is a time-intensive process that demands highly specialized expertise, which remains relatively scarce and not consistently accessible. To address these limitations, the implementation of automated pre-screening and analysis systems for EEG data holds considerable promise. Advances in self-supervised learning made it possible to pre-train complex deep learning architectures on large volumes of unlabeled EEG data to learn generalizable representations, that can later be used to enhance performance on multiple tasks while needing less downstream data. In the present paper, we introduce BioSerenity-E1, the first of a family of self-supervised foundation models for clinical EEG applications that combines spectral tokenization with masked prediction to achieve state-of-the-art performance across relevant diagnostic tasks. The two-phase self-supervised pretraining framework initially acquires compressed EEG representations via a transformer-based VQ-VAE architecture designed to reconstruct log-multitaper spectral projections, then implements extensive (70% block) masked token prediction to force the model to learn complex spatiotemporal dependencies in EEG signals. BioSerenity-E1 achieves strong performance across three clinical tasks, either in line or above state-of-the-art methods: seizure detection (AUROC = 0.926, Sensitivity = 0.909), normal/abnormal classification (AUPRC = 0.970 on proprietary data; 0.910 on TUH-Abnormal), and multiclass pathology differentiation on unbalanced data (Weighted F1 = 0.730). The utility of BioSerenity-E1 is further confirmed in low-data regimes scenarios, showing clear improvements in AUPRC (from +2% to 17%) when trained on less than 10% of the available data.

Paper Structure

This paper contains 22 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Tokenization and spectrum reconstruction. EEG tokenization is based on the following VQ-VAE architecture: pre-processed EEG is segmented into windows, which are divided into patches and processed through a temporal encoder. Position and channel information is embedded using sinusoidal encoding and the resulting embeddings traverse a deep sequence of transformer blocks. The encoded vectors are then quantized by compression through dense layers and mapped to nearest codebook vectors based on cosine similarity. Finally, a shallower decoder composed of transformer blocks and dense layers reconstructs the power spectra of input patches. The trained encoder block of the VQ-VAE architecture displayed in the figure is what we refer to as Tokenizer.
  • Figure 2: Masked-Token Predictor overview. Preprocessed EEG signals are first segmented into patches and processed by the pre-trained tokenizer to obtain, for each input patch, the index corresponding to the associated latent codebook vector. These indices will be used as the correct targets to predict. A portion of the input patches are then replaced with a learnable mask and passed through a network with the same architecture as the structure of the pre-trained tokenizer but with randomly initialized weights to get the embedding vector associated to each patch. These embedding vectors storing the encoded patches are then used to obtain the indices of the predicted codebook vector associated to masked and unmasked patches. The model is trained by minimizing cross-entropy loss between predicted indices and ground truth codebook indices from the pre-trained tokenizer for both masked and unmasked patches.
  • Figure 3: Fine-tuning overview.BioSerenity-E1 (i.e. the pre-trained Masked-Token Predictor) serves as base model for a trainable prediction head that is trained on the downstream task of interest. To accelerate fine-tuning, we froze all base model's weights, keeping only the prediction head trainable.
  • Figure 4: Seizure Detection on TUH-Seizure. Performance comparison of BioSerenity-E1 and baseline models on the seizure detection task using the TUH-Seizure dataset. BioSerenity-E1 demonstrates superior performance across most metrics, achieving the highest AUROC, AUPRC, Sensitivity, and Balanced Accuracy (indicated by stars). Error bars represent standard deviations over multiple runs. Models in shades of blue were run as baseline models to evaluate our model, whereas results of those in shades of grey represent the state-of-the-art obtained from the literature for binary seizure detection (see section “Baseline Models”).
  • Figure 5: Normal vs. Abnormal Classification on TUH-Abnormal. Performance comparison of BioSerenity-E1 against baseline models on the TUH-Abnormal EEG dataset for the normal-vs-abnormal classification task. BioSerenity-E1 ranks in the top quartile of state-of-the-art across all metrics, achieving the highest weighted F1 scores and Sensitivity (TPR), as indicated by the orange star. Models in shades of blue were run as baseline models to evaluate our model, whereas results of those in shades of grey represent the state-of-the-art obtained from the literature for binary seizure detection (see section “Baseline Models”).
  • ...and 6 more figures