Table of Contents
Fetching ...

Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal

Fanqi Shen, Enhong Yang, Jiahe Li, Junru Hong, Xiaoran Pan, Zhizhang Yuan, Meng Li, Yang Yang

TL;DR

Brain4FMs presents a unified SSL-centric benchmark and taxonomy for Brain Foundation Models applied to EEG/iEEG signals. It builds an open platform evaluating 15 BFMs across 18 public datasets spanning disease diagnosis, sleep staging, communication, and affective computing, with standardized preprocessing and cross-subject finetuning. The study reveals modality-driven transfer patterns, varying strengths of generative versus contrastive SSL strategies, and the importance of spatial/topological and frequency-domain representations for cross-subject generalization. This framework enables principled, reproducible comparisons and provides actionable insights to guide the design of more transferable BFMs with clinical impact. Overall, Brain4FMs offers a scalable, extensible benchmark to accelerate development and evaluation of foundation models for electrical brain signals.

Abstract

Brain Foundation Models (BFMs) are transforming neuroscience by enabling scalable and transferable learning from neural signals, advancing both clinical diagnostics and cutting-edge neuroscience exploration. Their emergence is powered by large-scale clinical recordings, particularly electroencephalography (EEG) and intracranial EEG, which provide rich temporal and spatial representations of brain dynamics. However, despite their rapid proliferation, the field lacks a unified understanding of existing methodologies and a standardized evaluation framework. To fill this gap, we map the benchmark design space along two axes: (i) from the model perspective, we organize BFMs under a self-supervised learning (SSL) taxonomy; and (ii) from the dataset perspective, we summarize common downstream tasks and curate representative public datasets across clinical and human-centric neurotechnology applications. Building on this consolidation, we introduce Brain4FMs, an open evaluation platform with plug-and-play interfaces that integrates 15 representative BFMs and 18 public datasets. It enables standardized comparisons and analysis of how pretraining data, SSL strategies, and architectures affect generalization and downstream performance, guiding more accurate and transferable BFMs. The code is available at https://anonymous.4open.science/r/Brain4FMs-85B8.

Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal

TL;DR

Brain4FMs presents a unified SSL-centric benchmark and taxonomy for Brain Foundation Models applied to EEG/iEEG signals. It builds an open platform evaluating 15 BFMs across 18 public datasets spanning disease diagnosis, sleep staging, communication, and affective computing, with standardized preprocessing and cross-subject finetuning. The study reveals modality-driven transfer patterns, varying strengths of generative versus contrastive SSL strategies, and the importance of spatial/topological and frequency-domain representations for cross-subject generalization. This framework enables principled, reproducible comparisons and provides actionable insights to guide the design of more transferable BFMs with clinical impact. Overall, Brain4FMs offers a scalable, extensible benchmark to accelerate development and evaluation of foundation models for electrical brain signals.

Abstract

Brain Foundation Models (BFMs) are transforming neuroscience by enabling scalable and transferable learning from neural signals, advancing both clinical diagnostics and cutting-edge neuroscience exploration. Their emergence is powered by large-scale clinical recordings, particularly electroencephalography (EEG) and intracranial EEG, which provide rich temporal and spatial representations of brain dynamics. However, despite their rapid proliferation, the field lacks a unified understanding of existing methodologies and a standardized evaluation framework. To fill this gap, we map the benchmark design space along two axes: (i) from the model perspective, we organize BFMs under a self-supervised learning (SSL) taxonomy; and (ii) from the dataset perspective, we summarize common downstream tasks and curate representative public datasets across clinical and human-centric neurotechnology applications. Building on this consolidation, we introduce Brain4FMs, an open evaluation platform with plug-and-play interfaces that integrates 15 representative BFMs and 18 public datasets. It enables standardized comparisons and analysis of how pretraining data, SSL strategies, and architectures affect generalization and downstream performance, guiding more accurate and transferable BFMs. The code is available at https://anonymous.4open.science/r/Brain4FMs-85B8.
Paper Structure (47 sections, 7 equations, 4 figures, 40 tables)

This paper contains 47 sections, 7 equations, 4 figures, 40 tables.

Figures (4)

  • Figure 1: Overview of BFMs. (a) A unified pretraining pipeline. EEG/iEEG recordings are preprocessed, encoded into latent representations, and optimized under different SSL paradigms. (b) Model scale statistics. Parameter-size buckets by training paradigm are shown for the reported subset, using each model’s maximum parameter count, alongside yearly model counts under the same grouping. (c) Timeline-style family tree of representative BFMs from 2021 to 2025, organized by paradigm and annotated with major methodological shifts.
  • Figure 2: Overview of benchmark pipeline. (a) Data acquisition scenarios covering sleep staging, affective computing, disease diagnosis, and communication. (b) Evaluation under cross-subject and cross-validation protocols, with a standardized EEG/iEEG preprocessing pipeline.
  • Figure 3: Supplemental analyses. (a) Box plots of decision-boundary diagnostics (q=0.10) comparing contrastive and generative models on cross-subject tasks. (b) AUROC of NeuroGPT variants across tasks. (c) Slopegraph of AUROC changes from original to channel-permuted, comparing spatially strong and weak models. (d) Heatmap of Spearman's rank correlation between band-wise predictability and model performance rankings across tasks. Points in (a–c) denote n-fold cross-validation means; asterisks in (c-d) indicate *(p<0.05) and **(p<0.001).
  • Figure 4: Sensitivity of decision-boundary error concentration to the boundary definition.