Cross-domain EEG-based Emotion Recognition with Contrastive Learning
Rui Yan, Yibo Li, Han Ding, Fei Wang
TL;DR
This work tackles cross-domain EEG-based emotion recognition by reframing it as EEG-text matching within a CLIP framework. It introduces EmotionCLIP, combining a frozen CLIP text encoder with the SST-LegoViT EEG backbone to align EEG features with semantic emotion descriptions, improving cross-subject and cross-time generalization. The approach yields state-of-the-art cross-subject and cross-time accuracies on SEED and SEED-IV, demonstrating the effectiveness of multimodal contrastive learning for robust affective computing. The results suggest that anchoring EEG representations to a stable multimodal semantic space can substantially enhance cross-domain robustness and practical deployment in real-world settings.
Abstract
Electroencephalogram (EEG)-based emotion recognition is vital for affective computing but faces challenges in feature utilization and cross-domain generalization. This work introduces EmotionCLIP, which reformulates recognition as an EEG-text matching task within the CLIP framework. A tailored backbone, SST-LegoViT, captures spatial, spectral, and temporal features using multi-scale convolution and Transformer modules. Experiments on SEED and SEED-IV datasets show superior cross-subject accuracies of 88.69% and 73.50%, and cross-time accuracies of 88.46% and 77.54%, outperforming existing models. Results demonstrate the effectiveness of multimodal contrastive learning for robust EEG emotion recognition.
