Table of Contents
Fetching ...

One Model for All: Universal Pre-training for EEG based Emotion Recognition across Heterogeneous Datasets and Paradigms

Xiang Li, You Li, Yazhou Zhang

TL;DR

The paper tackles profound heterogeneity in EEG-based emotion recognition across datasets by proposing a universal pre-training framework. It decouples learning into univariate channel-wise pre-training using contrastive SSL with a Unified Channel Schema (UCS), followed by per-subject fine-tuning with an Adaptive Resampling Transformer (ART) and Graph Attention Network (GAT) classifier. The approach achieves state-of-the-art within-subject performance on SEED, DEAP, and DREAMER, and demonstrates strong cross-dataset transfer, even surpassing within-domain upper bounds in some cases. Ablation studies show the necessity of pre-training, the critical role of GAT in handling noisy data, and the advantage of the ART encoder, collectively enabling scalable, universal EEG foundation models for diverse analysis tasks.

Abstract

EEG-based emotion recognition is hampered by profound dataset heterogeneity (channel/subject variability), hindering generalizable models. Existing approaches struggle to transfer knowledge effectively. We propose 'One Model for All', a universal pre-training framework for EEG analysis across disparate datasets. Our paradigm decouples learning into two stages: (1) Univariate pre-training via self-supervised contrastive learning on individual channels, enabled by a Unified Channel Schema (UCS) that leverages the channel union (e.g., SEED-62ch, DEAP-32ch); (2) Multivariate fine-tuning with a novel 'ART' (Adaptive Resampling Transformer) and 'GAT' (Graph Attention Network) architecture to capture complex spatio-temporal dependencies. Experiments show universal pre-training is an essential stabilizer, preventing collapse on SEED (vs. scratch) and yielding substantial gains on DEAP (+7.65%) and DREAMER (+3.55%). Our framework achieves new SOTA performance on all within-subject benchmarks: SEED (99.27%), DEAP (93.69%), and DREAMER (93.93%). We also show SOTA cross-dataset transfer, achieving 94.08% (intersection) and 93.05% (UCS) on the unseen DREAMER dataset, with the former surpassing the within-domain pre-training benchmark. Ablation studies validate our architecture: the GAT module is critical, yielding a +22.19% gain over GCN on the high-noise DEAP dataset, and its removal causes a catastrophic -16.44% performance drop. This work paves the way for more universal, scalable, and effective pre-trained models for diverse EEG analysis tasks.

One Model for All: Universal Pre-training for EEG based Emotion Recognition across Heterogeneous Datasets and Paradigms

TL;DR

The paper tackles profound heterogeneity in EEG-based emotion recognition across datasets by proposing a universal pre-training framework. It decouples learning into univariate channel-wise pre-training using contrastive SSL with a Unified Channel Schema (UCS), followed by per-subject fine-tuning with an Adaptive Resampling Transformer (ART) and Graph Attention Network (GAT) classifier. The approach achieves state-of-the-art within-subject performance on SEED, DEAP, and DREAMER, and demonstrates strong cross-dataset transfer, even surpassing within-domain upper bounds in some cases. Ablation studies show the necessity of pre-training, the critical role of GAT in handling noisy data, and the advantage of the ART encoder, collectively enabling scalable, universal EEG foundation models for diverse analysis tasks.

Abstract

EEG-based emotion recognition is hampered by profound dataset heterogeneity (channel/subject variability), hindering generalizable models. Existing approaches struggle to transfer knowledge effectively. We propose 'One Model for All', a universal pre-training framework for EEG analysis across disparate datasets. Our paradigm decouples learning into two stages: (1) Univariate pre-training via self-supervised contrastive learning on individual channels, enabled by a Unified Channel Schema (UCS) that leverages the channel union (e.g., SEED-62ch, DEAP-32ch); (2) Multivariate fine-tuning with a novel 'ART' (Adaptive Resampling Transformer) and 'GAT' (Graph Attention Network) architecture to capture complex spatio-temporal dependencies. Experiments show universal pre-training is an essential stabilizer, preventing collapse on SEED (vs. scratch) and yielding substantial gains on DEAP (+7.65%) and DREAMER (+3.55%). Our framework achieves new SOTA performance on all within-subject benchmarks: SEED (99.27%), DEAP (93.69%), and DREAMER (93.93%). We also show SOTA cross-dataset transfer, achieving 94.08% (intersection) and 93.05% (UCS) on the unseen DREAMER dataset, with the former surpassing the within-domain pre-training benchmark. Ablation studies validate our architecture: the GAT module is critical, yielding a +22.19% gain over GCN on the high-noise DEAP dataset, and its removal causes a catastrophic -16.44% performance drop. This work paves the way for more universal, scalable, and effective pre-trained models for diverse EEG analysis tasks.

Paper Structure

This paper contains 37 sections, 3 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: A conceptual overview of our proposed framework. (Left) The core challenge: EEG datasets are highly heterogeneous in channels, sampling rates, and subjects. (Right) Our solution: A universal pre-training framework designed to learn a unified representation from these disparate sources.
  • Figure 2: Illustration of the Unified Channel Schema (UCS). Channels from heterogeneous datasets (e.g., SEED-62ch, DEAP-32ch) are mapped via a 'union' operation. Notably, as all 32 channels from DEAP are a subset of the 62 channels from SEED, this results in a global vocabulary containing all unique channels (62 total). Each channel name is assigned a Global ID, which is then used by the model's embedding layer to retrieve a learnable channel-specific representation.
  • Figure 3: The overall architecture of our proposed 'Univariate Pre-training + Multivariate Fine-tuning' framework. (Left) The Univariate Pre-training stage learns a shared Transformer Encoder ($f_{\theta}$) using self-supervised contrastive learning on single, augmented EEG channels. (Right) The Multivariate Fine-tuning stage loads these weights (indicated by the 'Load Pre-trained Weights' arrow). This downstream classifier uses the shared pre-trained encoder ($f_{\theta}$) as its core feature extractor. The encoder's weights are fine-tuned (typically with a low differential learning rate) as part of the larger spatio-temporal model, which then uses a Linear Projection, a Transformer Encoder, and a Channel Interaction Layer (GAT) to capture complex dependencies and make the final emotion prediction.
  • Figure 4: Universal pre-training learns semantically structured representations, and the GAT module captures emotion-specific neural patterns. (Top Row) t-SNE visualization of feature embeddings from the DREAMER dataset. (a) The model trained from scratch produces a largely undifferentiated feature space. (b) In stark contrast, our pre-trained model transforms this space, organizing the features into highly structured manifolds. (Middle and Bottom Rows) GAT interpretability analysis for a single representative subject ('djc') comparing distinct emotional states. (Middle Row) (c) and (d) visualize the top 15% of GAT attention weights, revealing different functional connectivity patterns for Positive vs. Negative emotions. (Bottom Row) (e) and (f) highlight the most critical channels (nodes) identified via degree centrality (Degree $\ge$ 8), showing a clear shift in the model's spatial focus (e.g., from left-temporal (e) to bilateral posterior-parietal (f)) depending on the emotion being processed.
  • Figure 5: t-SNE visualization of the learned feature representations for a single subject after personalized fine-tuning. (a) Embeddings colored by cluster labels assigned by an unsupervised K-Means algorithm ($k=3$). (b) The same embeddings colored by their ground-truth emotion labels. The clear formation of distinct clusters corresponding to the true labels and their strong alignment with the K-Means results validate the model's ability to learn a highly discriminative and semantically structured feature space.