Table of Contents
Fetching ...

SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding

D. Darankoum, C. Habermacher, J. Volle, S. Grudinin

Abstract

Decoding the orchestration of neural activity in electroencephalography (EEG) signals is a central challenge in bridging neuroscience with artificial intelligence. Foundation models have made strides in generalized EEG decoding, yet many existing frameworks primarily relying on separate temporal and spectral masking of raw signals during self-supervised pretraining. Such strategies often tend to bias learning toward high-frequency oscillations, as low-frequency rhythmic patterns can be easily inferred from the unmasked signal. We introduce a foundation model that utilizes a novel Gaussian-smoothed masking scheme applied to short-time Fourier transform (STFT) maps. By jointly applying time, frequency, and time-frequency Gaussian masks, we make the reconstruction task much more challenging, forcing the model to learn intricate neural patterns across both high- and low-frequency domains. To effectively recover signals under this aggressive masking strategy, we design SpecHi-Net, a U-shaped hierarchical architecture with multiple encoding and decoding stages. To accelerate large-scale pretraining, we partition the data into three subsets, each used to train an independent expert model. We then combine these models through SpecMoE, a mixture of experts framework guided by a learned spectral gating mechanism. SpecMoE achieves state-of-the-art performance across a diverse set of EEG decoding tasks, including sleep staging, emotion recognition, motor imagery classification, abnormal signal detection, and drug effect prediction. Importantly, the model demonstrates strong cross-species and cross-subject generalization, maintaining high accuracy on both human and murine EEG datasets.

SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding

Abstract

Decoding the orchestration of neural activity in electroencephalography (EEG) signals is a central challenge in bridging neuroscience with artificial intelligence. Foundation models have made strides in generalized EEG decoding, yet many existing frameworks primarily relying on separate temporal and spectral masking of raw signals during self-supervised pretraining. Such strategies often tend to bias learning toward high-frequency oscillations, as low-frequency rhythmic patterns can be easily inferred from the unmasked signal. We introduce a foundation model that utilizes a novel Gaussian-smoothed masking scheme applied to short-time Fourier transform (STFT) maps. By jointly applying time, frequency, and time-frequency Gaussian masks, we make the reconstruction task much more challenging, forcing the model to learn intricate neural patterns across both high- and low-frequency domains. To effectively recover signals under this aggressive masking strategy, we design SpecHi-Net, a U-shaped hierarchical architecture with multiple encoding and decoding stages. To accelerate large-scale pretraining, we partition the data into three subsets, each used to train an independent expert model. We then combine these models through SpecMoE, a mixture of experts framework guided by a learned spectral gating mechanism. SpecMoE achieves state-of-the-art performance across a diverse set of EEG decoding tasks, including sleep staging, emotion recognition, motor imagery classification, abnormal signal detection, and drug effect prediction. Importantly, the model demonstrates strong cross-species and cross-subject generalization, maintaining high accuracy on both human and murine EEG datasets.
Paper Structure (63 sections, 8 equations, 16 figures, 13 tables)

This paper contains 63 sections, 8 equations, 16 figures, 13 tables.

Figures (16)

  • Figure 1: Left: previous masking strategies, where rectangular masks remove some time frames. Middle: original EEG signals. Right: proposed Gaussian masks remove some frequency oscillations (mostly low frequencies) in addition to certain time frames. The horizontal axis represents time in seconds, and the vertical axis is voltage in $\mu$V.
  • Figure 2: SpecMoE overview. A) Gaussian-based masking pipeline. B) Hierarchical encoder, with 'k' standing for 'kernel'. C) Hierarchical decoder. D) Reconstruction objective. E) Fine-tuning pipeline.
  • Figure 3: SpecMoE ablation results. We show the absolute value of the balanced accuracy for the SpecMoE model and relative differences for six ablation experiments. TF stands for time-frequency. See SI for other metrics.
  • Figure S1: SpecMoE ablation results - balanced accuracy. We show the absolute value of the balanced accuracy for the SpecMoE model and relative differences for six ablation experiments. TF stands for time-frequency.
  • Figure S2: SpecMoE ablation results - AUPRC. We show the absolute value of AUPRC for the SpecMoE model and relative differences for six ablation experiments. TF stands for time-frequency.
  • ...and 11 more figures