EMOD: A Unified EEG Emotion Representation Framework Leveraging V-A Guided Contrastive Learning
Yuning Chen, Sha Zhao, Shijian Li, Gang Pan
TL;DR
EMOD tackles cross-dataset generalization in EEG emotion recognition by projecting heterogeneous labels into a unified Valence-Arousal space and learning semantically aligned representations with a flexible Triple-Domain Encoder and Spatial-Temporal Transformer. A distance-aware, soft-weighted contrastive objective ties samples by emotional proximity, and structured cross-dataset sampling ensures diverse yet balanced exposure during pretraining on eight datasets. Empirical results show state-of-the-art performance on FACED, SEED-V, and SEED with a remarkably small model footprint, validating strong generalization across heterogeneous EEG formats and annotations. The approach advances affective computing by delivering robust, transferable EEG representations that respect the continuous structure of emotions and adapt to diverse data sources and tasks.
Abstract
Emotion recognition from EEG signals is essential for affective computing and has been widely explored using deep learning. While recent deep learning approaches have achieved strong performance on single EEG emotion datasets, their generalization across datasets remains limited due to the heterogeneity in annotation schemes and data formats. Existing models typically require dataset-specific architectures tailored to input structure and lack semantic alignment across diverse emotion labels. To address these challenges, we propose EMOD: A Unified EEG Emotion Representation Framework Leveraging Valence-Arousal (V-A) Guided Contrastive Learning. EMOD learns transferable and emotion-aware representations from heterogeneous datasets by bridging both semantic and structural gaps. Specifically, we project discrete and continuous emotion labels into a unified V-A space and formulate a soft-weighted supervised contrastive loss that encourages emotionally similar samples to cluster in the latent space. To accommodate variable EEG formats, EMOD employs a flexible backbone comprising a Triple-Domain Encoder followed by a Spatial-Temporal Transformer, enabling robust extraction and integration of temporal, spectral, and spatial features. We pretrain EMOD on 8 public EEG datasets and evaluate its performance on three benchmark datasets. Experimental results show that EMOD achieves the state-of-the-art performance, demonstrating strong adaptability and generalization across diverse EEG-based emotion recognition scenarios.
