Table of Contents
Fetching ...

FEMBA: Efficient and Scalable EEG Analysis with a Bidirectional Mamba Foundation Model

Anna Tegon, Thorir Mar Ingolfsson, Xiaying Wang, Luca Benini, Yawei Li

TL;DR

FEMBA introduces a bidirectional state-space EEG model built on Mamba to overcome the quadratic bottleneck of Transformer architectures for long EEG sequences. Pretrained on over $21{,}000$ hours of unlabeled EEG, FEMBA achieves competitive performance on TUAB, TUAR, and TUSL while significantly reducing FLOPs and memory, including a Tiny 7.8M-parameter variant suitable for edge devices. The approach combines a four-size architecture with forward and backward processing, a self-supervised masking objective, and lightweight decoders, enabling scalable deployments from hospital-scale analysis to wearable devices. Overall, FEMBA demonstrates that linear-time state-space models can rival Transformer-based EEG models, offering practical advantages for real-world, resource-constrained settings.

Abstract

Accurate and efficient electroencephalography (EEG) analysis is essential for detecting seizures and artifacts in long-term monitoring, with applications spanning hospital diagnostics to wearable health devices. Robust EEG analytics have the potential to greatly improve patient care. However, traditional deep learning models, especially Transformer-based architectures, are hindered by their quadratic time and memory complexity, making them less suitable for resource-constrained environments. To address these challenges, we present FEMBA (Foundational EEG Mamba + Bidirectional Architecture), a novel self-supervised framework that establishes new efficiency benchmarks for EEG analysis through bidirectional state-space modeling. Unlike Transformer-based models, which incur quadratic time and memory complexity, FEMBA scales linearly with sequence length, enabling more scalable and efficient processing of extended EEG recordings. Trained on over 21,000 hours of unlabeled EEG and fine-tuned on three downstream tasks, FEMBA achieves competitive performance in comparison with transformer models, with significantly lower computational cost. Specifically, it reaches 81.82% balanced accuracy (0.8921 AUROC) on TUAB and 0.949 AUROC on TUAR, while a tiny 7.8M-parameter variant demonstrates viability for resource-constrained devices. These results pave the way for scalable, general-purpose EEG analytics in both clinical and highlight FEMBA as a promising candidate for wearable applications.

FEMBA: Efficient and Scalable EEG Analysis with a Bidirectional Mamba Foundation Model

TL;DR

FEMBA introduces a bidirectional state-space EEG model built on Mamba to overcome the quadratic bottleneck of Transformer architectures for long EEG sequences. Pretrained on over hours of unlabeled EEG, FEMBA achieves competitive performance on TUAB, TUAR, and TUSL while significantly reducing FLOPs and memory, including a Tiny 7.8M-parameter variant suitable for edge devices. The approach combines a four-size architecture with forward and backward processing, a self-supervised masking objective, and lightweight decoders, enabling scalable deployments from hospital-scale analysis to wearable devices. Overall, FEMBA demonstrates that linear-time state-space models can rival Transformer-based EEG models, offering practical advantages for real-world, resource-constrained settings.

Abstract

Accurate and efficient electroencephalography (EEG) analysis is essential for detecting seizures and artifacts in long-term monitoring, with applications spanning hospital diagnostics to wearable health devices. Robust EEG analytics have the potential to greatly improve patient care. However, traditional deep learning models, especially Transformer-based architectures, are hindered by their quadratic time and memory complexity, making them less suitable for resource-constrained environments. To address these challenges, we present FEMBA (Foundational EEG Mamba + Bidirectional Architecture), a novel self-supervised framework that establishes new efficiency benchmarks for EEG analysis through bidirectional state-space modeling. Unlike Transformer-based models, which incur quadratic time and memory complexity, FEMBA scales linearly with sequence length, enabling more scalable and efficient processing of extended EEG recordings. Trained on over 21,000 hours of unlabeled EEG and fine-tuned on three downstream tasks, FEMBA achieves competitive performance in comparison with transformer models, with significantly lower computational cost. Specifically, it reaches 81.82% balanced accuracy (0.8921 AUROC) on TUAB and 0.949 AUROC on TUAR, while a tiny 7.8M-parameter variant demonstrates viability for resource-constrained devices. These results pave the way for scalable, general-purpose EEG analytics in both clinical and highlight FEMBA as a promising candidate for wearable applications.

Paper Structure

This paper contains 25 sections, 4 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of the proposed FEMBA (Foundational Mamba + Bidirectional Architecture) pipeline. The input signal (with channels $C$ and length $T$) is first tokenized via a 2D convolution and flattening layer. Random masking is then applied to a subset of the patches for self-supervised learning. The masked tokens pass through the FEMBA encoder, which stacks multiple Bidirectional Mamba blocks. Within each block, the sequence is processed by parallel forward and backward Mamba components (the backward component operating on a temporally reversed input sequence). The outputs from both directions are then combined (e.g., via summation) to capture dependencies from both past and future contexts before potentially passing through normalization and feed-forward layers. Finally, a lightweight decoder (for reconstruction) or a classification head (for downstream tasks) reconstructs or classifies the signals, respectively.
  • Figure 2: Example of signal reconstruction during pre-training, with masked segments indicated in gray.
  • Figure 3: Comparison of LaBraM jianglarge, FEMBA, and EEGFormer chen2024eegformer in terms of computational inference (left) and memory usage (in megabytes, MB) (right)