Table of Contents
Fetching ...

EMOD: A Unified EEG Emotion Representation Framework Leveraging V-A Guided Contrastive Learning

Yuning Chen, Sha Zhao, Shijian Li, Gang Pan

TL;DR

EMOD tackles cross-dataset generalization in EEG emotion recognition by projecting heterogeneous labels into a unified Valence-Arousal space and learning semantically aligned representations with a flexible Triple-Domain Encoder and Spatial-Temporal Transformer. A distance-aware, soft-weighted contrastive objective ties samples by emotional proximity, and structured cross-dataset sampling ensures diverse yet balanced exposure during pretraining on eight datasets. Empirical results show state-of-the-art performance on FACED, SEED-V, and SEED with a remarkably small model footprint, validating strong generalization across heterogeneous EEG formats and annotations. The approach advances affective computing by delivering robust, transferable EEG representations that respect the continuous structure of emotions and adapt to diverse data sources and tasks.

Abstract

Emotion recognition from EEG signals is essential for affective computing and has been widely explored using deep learning. While recent deep learning approaches have achieved strong performance on single EEG emotion datasets, their generalization across datasets remains limited due to the heterogeneity in annotation schemes and data formats. Existing models typically require dataset-specific architectures tailored to input structure and lack semantic alignment across diverse emotion labels. To address these challenges, we propose EMOD: A Unified EEG Emotion Representation Framework Leveraging Valence-Arousal (V-A) Guided Contrastive Learning. EMOD learns transferable and emotion-aware representations from heterogeneous datasets by bridging both semantic and structural gaps. Specifically, we project discrete and continuous emotion labels into a unified V-A space and formulate a soft-weighted supervised contrastive loss that encourages emotionally similar samples to cluster in the latent space. To accommodate variable EEG formats, EMOD employs a flexible backbone comprising a Triple-Domain Encoder followed by a Spatial-Temporal Transformer, enabling robust extraction and integration of temporal, spectral, and spatial features. We pretrain EMOD on 8 public EEG datasets and evaluate its performance on three benchmark datasets. Experimental results show that EMOD achieves the state-of-the-art performance, demonstrating strong adaptability and generalization across diverse EEG-based emotion recognition scenarios.

EMOD: A Unified EEG Emotion Representation Framework Leveraging V-A Guided Contrastive Learning

TL;DR

EMOD tackles cross-dataset generalization in EEG emotion recognition by projecting heterogeneous labels into a unified Valence-Arousal space and learning semantically aligned representations with a flexible Triple-Domain Encoder and Spatial-Temporal Transformer. A distance-aware, soft-weighted contrastive objective ties samples by emotional proximity, and structured cross-dataset sampling ensures diverse yet balanced exposure during pretraining on eight datasets. Empirical results show state-of-the-art performance on FACED, SEED-V, and SEED with a remarkably small model footprint, validating strong generalization across heterogeneous EEG formats and annotations. The approach advances affective computing by delivering robust, transferable EEG representations that respect the continuous structure of emotions and adapt to diverse data sources and tasks.

Abstract

Emotion recognition from EEG signals is essential for affective computing and has been widely explored using deep learning. While recent deep learning approaches have achieved strong performance on single EEG emotion datasets, their generalization across datasets remains limited due to the heterogeneity in annotation schemes and data formats. Existing models typically require dataset-specific architectures tailored to input structure and lack semantic alignment across diverse emotion labels. To address these challenges, we propose EMOD: A Unified EEG Emotion Representation Framework Leveraging Valence-Arousal (V-A) Guided Contrastive Learning. EMOD learns transferable and emotion-aware representations from heterogeneous datasets by bridging both semantic and structural gaps. Specifically, we project discrete and continuous emotion labels into a unified V-A space and formulate a soft-weighted supervised contrastive loss that encourages emotionally similar samples to cluster in the latent space. To accommodate variable EEG formats, EMOD employs a flexible backbone comprising a Triple-Domain Encoder followed by a Spatial-Temporal Transformer, enabling robust extraction and integration of temporal, spectral, and spatial features. We pretrain EMOD on 8 public EEG datasets and evaluate its performance on three benchmark datasets. Experimental results show that EMOD achieves the state-of-the-art performance, demonstrating strong adaptability and generalization across diverse EEG-based emotion recognition scenarios.

Paper Structure

This paper contains 27 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Comparison between conventional EEG emotion recognition and our EMOD framework. Traditional methods rely on dataset-specific supervision and manually designed architectures, limiting generalization across datasets. In contrast, EMOD employs V-A guided pretraining on heterogeneous data, enabling robust and generalizable emotion-aware representations.
  • Figure 2: Overview of the EMOD framework.
  • Figure 3: Ablation study on model structure. The red dashed line on top indicates the BACC of the complete EMOD model.
  • Figure 4: t-SNE visualization of EEG samples from the DEAP dataset. Left: EMOD w/o pretrain; Right: pretrained EMOD. Dots are color-coded by Valence (top) and Arousal (bottom), and green circled numbers indicate the centroids of discrete V-A scores (–4 to 4). Compared to the untrained model, the pretrained EMOD produces more structured and clearly separated clusters. In addition, the centroids follow a smooth emotional gradient, preserving the valence/arousal ordering in the learned representation space.