Standing on the Shoulders of Giants: Rethinking EEG Foundation Model Pretraining via Multi-Teacher Distillation

Chenqi Li; Yu Liu; Shuo Zhang; Timothy Denison; Tingting Zhu

Standing on the Shoulders of Giants: Rethinking EEG Foundation Model Pretraining via Multi-Teacher Distillation

Chenqi Li, Yu Liu, Shuo Zhang, Timothy Denison, Tingting Zhu

TL;DR

This work proposes the Multi-Teacher Distillation Pretraining (MTDP) framework for pretraining EEG foundation models via a two-stage multi-teacher distillation, and demonstrates that mainstream foundation models, such as those from vision and time series, transfer surprisingly well to EEG domain.

Abstract

Pretraining for electroencephalogram (EEG) foundation models has predominantly relied on self-supervised masked reconstruction, a paradigm largely adapted from and inspired by the success of vision and language foundation models. However, unlike images and text, EEG datasets are notoriously expensive to collect and characterized by low signal-to-noise ratio. These challenges introduce difficulties in scaling the EEG foundation models and capturing the underlying neural semantics through reconstruction. In this work, we ask the question: can we stand on the shoulders of well-established foundation models from well-represented modalities to bootstrap the pretraining of EEG foundation models? We first demonstrate that mainstream foundation models, such as those from vision and time series, transfer surprisingly well to EEG domain. To this end, we propose the Multi-Teacher Distillation Pretraining (MTDP) framework for pretraining EEG foundation models via a two-stage multi-teacher distillation. In the first stage, we introduce a learnable gating network to fuse representations from diverse teachers (e.g., DINOv3 and Chronos) via a masked latent denoising objective. In the second stage, we distill the fused representation into an EEG foundation model. Extensive evaluations across 9 downstream tasks and 12 datasets demonstrate that our MTDP-based EEG foundation model outperforms its self-supervised counterparts while requiring only 25% of the pretraining data.

Standing on the Shoulders of Giants: Rethinking EEG Foundation Model Pretraining via Multi-Teacher Distillation

TL;DR

Abstract

Paper Structure (29 sections, 9 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 29 sections, 9 equations, 5 figures, 7 tables, 1 algorithm.

Introduction
Related Works
EEG Foundation Model
Repurposing Models for Time Series
Multi-Teacher Distillation
Preliminary Experiment
Methodology
Problem Definition
Two-Stage Multi-Teacher Distillation Pretraining
Stage 1: Teacher Representation Fusion
Stage 2 Knowledge Distillation
Experiments
Experimental Setups
Teacher and Student Models
Preprocessing EEG for Teachers
...and 14 more sections

Figures (5)

Figure 1: Comparison of EEG Foundation Model Pretraining. a) The conventional self-supervised pretraining where EEG foundation model reconstructs missing patches in the temporal, frequency or latent-domain. b) The proposed framework to bootstrap EEG foundation model pretraining by standing on the shoulders of well-established foundation models from well-represented modalities.
Figure 2: Linear probing performance of CBraMod and DINOv3 on EEG downstream tasks. Balanced Accuracy (%).
Figure 3: Overview of Two-Stage Multi-Teacher Distillation Pretraining (MTDP). Stage 1: Teacher Representation Fusion. A learnable gating network is introduced to weigh and fuse representations from frozen teacher models. The gate is trained via a masked latent denoising objective. Stage 2: Knowledge Distillation. The fused teacher representation acts as the target to pretrain the student EEG foundation model. The distillation loss is minimized to align the student representations with the fused representations.
Figure 4: Linear probing performance of CBraMod and CBraMod-MTDP on EEG downstream tasks. Balanced Accuracy (%).
Figure 5: Loss curve of stage 1 and stage 2 pretraining

Standing on the Shoulders of Giants: Rethinking EEG Foundation Model Pretraining via Multi-Teacher Distillation

TL;DR

Abstract

Standing on the Shoulders of Giants: Rethinking EEG Foundation Model Pretraining via Multi-Teacher Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)