Table of Contents
Fetching ...

Cross-domain EEG-based Emotion Recognition with Contrastive Learning

Rui Yan, Yibo Li, Han Ding, Fei Wang

TL;DR

This work tackles cross-domain EEG-based emotion recognition by reframing it as EEG-text matching within a CLIP framework. It introduces EmotionCLIP, combining a frozen CLIP text encoder with the SST-LegoViT EEG backbone to align EEG features with semantic emotion descriptions, improving cross-subject and cross-time generalization. The approach yields state-of-the-art cross-subject and cross-time accuracies on SEED and SEED-IV, demonstrating the effectiveness of multimodal contrastive learning for robust affective computing. The results suggest that anchoring EEG representations to a stable multimodal semantic space can substantially enhance cross-domain robustness and practical deployment in real-world settings.

Abstract

Electroencephalogram (EEG)-based emotion recognition is vital for affective computing but faces challenges in feature utilization and cross-domain generalization. This work introduces EmotionCLIP, which reformulates recognition as an EEG-text matching task within the CLIP framework. A tailored backbone, SST-LegoViT, captures spatial, spectral, and temporal features using multi-scale convolution and Transformer modules. Experiments on SEED and SEED-IV datasets show superior cross-subject accuracies of 88.69% and 73.50%, and cross-time accuracies of 88.46% and 77.54%, outperforming existing models. Results demonstrate the effectiveness of multimodal contrastive learning for robust EEG emotion recognition.

Cross-domain EEG-based Emotion Recognition with Contrastive Learning

TL;DR

This work tackles cross-domain EEG-based emotion recognition by reframing it as EEG-text matching within a CLIP framework. It introduces EmotionCLIP, combining a frozen CLIP text encoder with the SST-LegoViT EEG backbone to align EEG features with semantic emotion descriptions, improving cross-subject and cross-time generalization. The approach yields state-of-the-art cross-subject and cross-time accuracies on SEED and SEED-IV, demonstrating the effectiveness of multimodal contrastive learning for robust affective computing. The results suggest that anchoring EEG representations to a stable multimodal semantic space can substantially enhance cross-domain robustness and practical deployment in real-world settings.

Abstract

Electroencephalogram (EEG)-based emotion recognition is vital for affective computing but faces challenges in feature utilization and cross-domain generalization. This work introduces EmotionCLIP, which reformulates recognition as an EEG-text matching task within the CLIP framework. A tailored backbone, SST-LegoViT, captures spatial, spectral, and temporal features using multi-scale convolution and Transformer modules. Experiments on SEED and SEED-IV datasets show superior cross-subject accuracies of 88.69% and 73.50%, and cross-time accuracies of 88.46% and 77.54%, outperforming existing models. Results demonstrate the effectiveness of multimodal contrastive learning for robust EEG emotion recognition.

Paper Structure

This paper contains 16 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: EmotionCLIP transforms the cross-domain emotion recognition task from a classification problem to an EEG-Text matching problem.
  • Figure 2: EEG Encoder sequentially processes spatial, frequency-band, and temporal information from a 4D EEG representation.
  • Figure 3: Spatial Multi-scale Encoder aggregates multi-resolution features to enhance spatial representation across varying scales.
  • Figure 4: LegoFormer processes DE and PSD features through parallel encoders. Fuses them via cross-attention, using DE as the primary context to guide auxiliary PSD information.
  • Figure 5: Experimental Results.