Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal

Fanqi Shen; Enhong Yang; Jiahe Li; Junru Hong; Xiaoran Pan; Zhizhang Yuan; Meng Li; Yang Yang

Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal

Fanqi Shen, Enhong Yang, Jiahe Li, Junru Hong, Xiaoran Pan, Zhizhang Yuan, Meng Li, Yang Yang

TL;DR

Brain4FMs presents a unified SSL-centric benchmark and taxonomy for Brain Foundation Models applied to EEG/iEEG signals. It builds an open platform evaluating 15 BFMs across 18 public datasets spanning disease diagnosis, sleep staging, communication, and affective computing, with standardized preprocessing and cross-subject finetuning. The study reveals modality-driven transfer patterns, varying strengths of generative versus contrastive SSL strategies, and the importance of spatial/topological and frequency-domain representations for cross-subject generalization. This framework enables principled, reproducible comparisons and provides actionable insights to guide the design of more transferable BFMs with clinical impact. Overall, Brain4FMs offers a scalable, extensible benchmark to accelerate development and evaluation of foundation models for electrical brain signals.

Abstract

Brain Foundation Models (BFMs) are transforming neuroscience by enabling scalable and transferable learning from neural signals, advancing both clinical diagnostics and cutting-edge neuroscience exploration. Their emergence is powered by large-scale clinical recordings, particularly electroencephalography (EEG) and intracranial EEG, which provide rich temporal and spatial representations of brain dynamics. However, despite their rapid proliferation, the field lacks a unified understanding of existing methodologies and a standardized evaluation framework. To fill this gap, we map the benchmark design space along two axes: (i) from the model perspective, we organize BFMs under a self-supervised learning (SSL) taxonomy; and (ii) from the dataset perspective, we summarize common downstream tasks and curate representative public datasets across clinical and human-centric neurotechnology applications. Building on this consolidation, we introduce Brain4FMs, an open evaluation platform with plug-and-play interfaces that integrates 15 representative BFMs and 18 public datasets. It enables standardized comparisons and analysis of how pretraining data, SSL strategies, and architectures affect generalization and downstream performance, guiding more accurate and transferable BFMs. The code is available at https://anonymous.4open.science/r/Brain4FMs-85B8.

Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal

TL;DR

Abstract

Paper Structure (47 sections, 7 equations, 4 figures, 40 tables)

This paper contains 47 sections, 7 equations, 4 figures, 40 tables.

Introduction
Model Taxonomy
Contrastive-based Methods
Augmentation Contrast
Contrastive Predictive Coding
Cross-modal Contrast
Generative-based Methods
Autoregressive-based
Autoencoder-based
Other Advanced Methods
Explicit Predictive-based
Hybrid-based
Instruction-tuned
Benchmark
Pipeline
...and 32 more sections

Figures (4)

Figure 1: Overview of BFMs. (a) A unified pretraining pipeline. EEG/iEEG recordings are preprocessed, encoded into latent representations, and optimized under different SSL paradigms. (b) Model scale statistics. Parameter-size buckets by training paradigm are shown for the reported subset, using each model’s maximum parameter count, alongside yearly model counts under the same grouping. (c) Timeline-style family tree of representative BFMs from 2021 to 2025, organized by paradigm and annotated with major methodological shifts.
Figure 2: Overview of benchmark pipeline. (a) Data acquisition scenarios covering sleep staging, affective computing, disease diagnosis, and communication. (b) Evaluation under cross-subject and cross-validation protocols, with a standardized EEG/iEEG preprocessing pipeline.
Figure 3: Supplemental analyses. (a) Box plots of decision-boundary diagnostics (q=0.10) comparing contrastive and generative models on cross-subject tasks. (b) AUROC of NeuroGPT variants across tasks. (c) Slopegraph of AUROC changes from original to channel-permuted, comparing spatially strong and weak models. (d) Heatmap of Spearman's rank correlation between band-wise predictability and model performance rankings across tasks. Points in (a–c) denote n-fold cross-validation means; asterisks in (c-d) indicate *(p<0.05) and **(p<0.001).
Figure 4: Sensitivity of decision-boundary error concentration to the boundary definition.

Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal

TL;DR

Abstract

Brain4FMs: A Benchmark of Foundation Models for Electrical Brain Signal

Authors

TL;DR

Abstract

Table of Contents

Figures (4)