Table of Contents
Fetching ...

SAMBA: Toward a Long-Context EEG Foundation Model via Spatial Embedding and Differential Mamba

Jiazhen Hong, Geoffrey Mackellar, Soheila Ghane

TL;DR

SAMBA addresses the challenge of long-context EEG modeling by integrating a Mamba2-based U-shaped encoder–decoder with a 3D spatial-adaptive input embedding and a Multi-head Differential Mamba. It introduces Temporal Semantic Random masking and a Time–Frequency reconstruction objective to preserve temporal and spectral information across very long sequences. Across 13 EEG datasets, SAMBA achieves superior in-domain and cross-domain performance while offering favorable memory and inference efficiency, plus interpretable spatial weight maps aligned with neurophysiological regions. These findings support SAMBA as a scalable foundation model for real-time brain–computer interface applications across heterogeneous electrode montages and durations.

Abstract

Long-sequence electroencephalogram (EEG) modeling is essential for developing generalizable EEG representation models. This need arises from the high sampling rate of EEG data and the long recording durations required to capture extended neurological patterns in brain activity. Transformer-based models have shown promise in modeling short sequences of a few seconds; however, their quadratic complexity limits scalability to longer contexts. Moreover, variability in electrode montage across available datasets, along with inter-subject differences in brain signals, pose significant challenges to developing a generalizable and robust foundation model. We propose \textit{SAMBA}, a self-supervised learning framework with a Mamba-based U-shaped encoder-decoder architecture, which effectively captures long-range temporal dependencies and spatial variability in EEG data. Leveraging the inherent ability of Mamba in processing long context sizes, we introduce: (1) \textit{Temporal Semantic Random Masking} for semantic-level sequence reconstruction, (2) a \textit{Multi-Head Differential Mamba} module to suppress redundancy and emphasize salient temporal structures, and (3) a \textit{Spatial-Adaptive Input Embedding} that learns unified embeddings in a three-dimensional Euclidean space, enabling robustness across devices. Experiments on thirteen EEG datasets across diverse tasks, electrode configurations, and sequence durations demonstrate that SAMBA consistently outperforms state-of-the-art methods while maintaining low memory consumption and inference time. We also show the learned spatial weight maps from our embedding module align closely with task-relevant neurophysiological regions, demonstrating the learnability and interpretability of SAMBA. These results highlight SAMBA's scalability and practical potential as a foundation model for real-time brain-computer interface applications.

SAMBA: Toward a Long-Context EEG Foundation Model via Spatial Embedding and Differential Mamba

TL;DR

SAMBA addresses the challenge of long-context EEG modeling by integrating a Mamba2-based U-shaped encoder–decoder with a 3D spatial-adaptive input embedding and a Multi-head Differential Mamba. It introduces Temporal Semantic Random masking and a Time–Frequency reconstruction objective to preserve temporal and spectral information across very long sequences. Across 13 EEG datasets, SAMBA achieves superior in-domain and cross-domain performance while offering favorable memory and inference efficiency, plus interpretable spatial weight maps aligned with neurophysiological regions. These findings support SAMBA as a scalable foundation model for real-time brain–computer interface applications across heterogeneous electrode montages and durations.

Abstract

Long-sequence electroencephalogram (EEG) modeling is essential for developing generalizable EEG representation models. This need arises from the high sampling rate of EEG data and the long recording durations required to capture extended neurological patterns in brain activity. Transformer-based models have shown promise in modeling short sequences of a few seconds; however, their quadratic complexity limits scalability to longer contexts. Moreover, variability in electrode montage across available datasets, along with inter-subject differences in brain signals, pose significant challenges to developing a generalizable and robust foundation model. We propose \textit{SAMBA}, a self-supervised learning framework with a Mamba-based U-shaped encoder-decoder architecture, which effectively captures long-range temporal dependencies and spatial variability in EEG data. Leveraging the inherent ability of Mamba in processing long context sizes, we introduce: (1) \textit{Temporal Semantic Random Masking} for semantic-level sequence reconstruction, (2) a \textit{Multi-Head Differential Mamba} module to suppress redundancy and emphasize salient temporal structures, and (3) a \textit{Spatial-Adaptive Input Embedding} that learns unified embeddings in a three-dimensional Euclidean space, enabling robustness across devices. Experiments on thirteen EEG datasets across diverse tasks, electrode configurations, and sequence durations demonstrate that SAMBA consistently outperforms state-of-the-art methods while maintaining low memory consumption and inference time. We also show the learned spatial weight maps from our embedding module align closely with task-relevant neurophysiological regions, demonstrating the learnability and interpretability of SAMBA. These results highlight SAMBA's scalability and practical potential as a foundation model for real-time brain-computer interface applications.

Paper Structure

This paper contains 37 sections, 15 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Spatial embedding compatibility of SAMBA and prior EEG models across heterogeneous electrode layouts.
  • Figure 2: T-SNE plots from Crowdsourced (a-b) and P300 (c-d) datasets, comparing the distribution of raw EEG (a, c) and representations learned by SAMBA (b, d).
  • Figure 3: SAMBA Architecture.
  • Figure 4: Comparison of proposed TSR masking with existing strategies.
  • Figure 5: SAIE projects EEG from input to target space using spatial weights derived from relative 3D coordinates.
  • ...and 5 more figures