Table of Contents
Fetching ...

EEG-X: Device-Agnostic and Noise-Robust Foundation Model for EEG

Navid Mohammadi Foumani, Soheila Ghane, Nam Nguyen, Mahsa Salehi, Geoffrey I. Webb, Geoffrey Mackellar

TL;DR

EEG-X tackles variability across EEG devices and low signal-to-noise ratio by introducing a location-based channel embedding, a noise-aware dual reconstruction objective (latent-space self-prediction and artifact-removed raw-space reconstruction), and a Dictionary Convolution Transformation (DiCT) to shape the reconstruction loss. It combines latent-space predictions with raw-space denoised reconstructions under a unified objective $L_{total} = L_{rec} + L_{align} + L_{reg}$ and uses a universal scalp mapping for electrode embeddings. Across seven diverse datasets and cross-domain transfers, EEG-X achieves state-of-the-art performance and strong generalization when electrode layouts differ, establishing a solid foundation for EEG analytics. The work provides public code to foster reuse and further development.

Abstract

Foundation models for EEG analysis are still in their infancy, limited by two key challenges: (1) variability across datasets caused by differences in recording devices and configurations, and (2) the low signal-to-noise ratio (SNR) of EEG, where brain signals are often buried under artifacts and non-brain sources. To address these challenges, we present EEG-X, a device-agnostic and noise-robust foundation model for EEG representation learning. EEG-X introduces a novel location-based channel embedding that encodes spatial information and improves generalization across domains and tasks by allowing the model to handle varying channel numbers, combinations, and recording lengths. To enhance robustness against noise, EEG-X employs a noise-aware masking and reconstruction strategy in both raw and latent spaces. Unlike previous models that mask and reconstruct raw noisy EEG signals, EEG-X is trained to reconstruct denoised signals obtained through an artifact removal process, ensuring that the learned representations focus on neural activity rather than noise. To further enhance reconstruction-based pretraining, EEG-X introduces a dictionary-inspired convolutional transformation (DiCT) layer that projects signals into a structured feature space before computing reconstruction (MSE) loss, reducing noise sensitivity and capturing frequency- and shape-aware similarities. Experiments on datasets collected from diverse devices show that EEG-X outperforms state-of-the-art methods across multiple downstream EEG tasks and excels in cross-domain settings where pre-trained and downstream datasets differ in electrode layouts. The models and code are available at: https://github.com/Emotiv/EEG-X

EEG-X: Device-Agnostic and Noise-Robust Foundation Model for EEG

TL;DR

EEG-X tackles variability across EEG devices and low signal-to-noise ratio by introducing a location-based channel embedding, a noise-aware dual reconstruction objective (latent-space self-prediction and artifact-removed raw-space reconstruction), and a Dictionary Convolution Transformation (DiCT) to shape the reconstruction loss. It combines latent-space predictions with raw-space denoised reconstructions under a unified objective and uses a universal scalp mapping for electrode embeddings. Across seven diverse datasets and cross-domain transfers, EEG-X achieves state-of-the-art performance and strong generalization when electrode layouts differ, establishing a solid foundation for EEG analytics. The work provides public code to foster reuse and further development.

Abstract

Foundation models for EEG analysis are still in their infancy, limited by two key challenges: (1) variability across datasets caused by differences in recording devices and configurations, and (2) the low signal-to-noise ratio (SNR) of EEG, where brain signals are often buried under artifacts and non-brain sources. To address these challenges, we present EEG-X, a device-agnostic and noise-robust foundation model for EEG representation learning. EEG-X introduces a novel location-based channel embedding that encodes spatial information and improves generalization across domains and tasks by allowing the model to handle varying channel numbers, combinations, and recording lengths. To enhance robustness against noise, EEG-X employs a noise-aware masking and reconstruction strategy in both raw and latent spaces. Unlike previous models that mask and reconstruct raw noisy EEG signals, EEG-X is trained to reconstruct denoised signals obtained through an artifact removal process, ensuring that the learned representations focus on neural activity rather than noise. To further enhance reconstruction-based pretraining, EEG-X introduces a dictionary-inspired convolutional transformation (DiCT) layer that projects signals into a structured feature space before computing reconstruction (MSE) loss, reducing noise sensitivity and capturing frequency- and shape-aware similarities. Experiments on datasets collected from diverse devices show that EEG-X outperforms state-of-the-art methods across multiple downstream EEG tasks and excels in cross-domain settings where pre-trained and downstream datasets differ in electrode layouts. The models and code are available at: https://github.com/Emotiv/EEG-X

Paper Structure

This paper contains 23 sections, 8 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Architecture of EEG-X with three components—tokenizer (with location-based channel embedding), latent space self-reconstruction, and raw space reconstruction with artifact removal and DiCT. Pretrained using latent- and raw-space reconstruction losses.
  • Figure 2: (a) Location-based channel embedding using a universal scalp coordinate mapping to assign each channel Cartesian coordinates. (b) Embedding similarity of electrode F4 with all other channels, showing higher similarity to nearby electrodes (e.g., F2, F6) than distant ones (e.g., P7).
  • Figure 3: Comparison of four different EEG headset configurations used for data collection
  • Figure 4: Composite source signal used in the synthetic experiment, formed by summing three sine waves: low frequency (2 Hz, amplitude 5.0), mid frequency (20 Hz, amplitude 1.0), and high frequency (100 Hz, amplitude 0.5).
  • Figure 5: Three reconstructed signals generated from a combined input signal, illustrating different distortions: (1) low-frequency dominant, (2) high-frequency dominant, and (3) high-frequency dominant with phase shifts. These examples highlight the bias of direct MSE toward amplitude and the mitigating effect of the DiCT transformation (see Table \ref{['tab:synthetic_dict']} for quantitative results).
  • ...and 2 more figures