Joint Contrastive Learning with Feature Alignment for Cross-Corpus EEG-based Emotion Recognition
Qile Liu, Zhihao Zhou, Jiyuan Wang, Zhen Liang
TL;DR
This paper tackles cross-corpus EEG-based emotion recognition by proposing JCFA, a two-stage framework combining self-supervised joint time-frequency contrastive learning with graph-based supervised fine-tuning. In pre-training, JCFA learns robust time-domain, frequency-domain, and time-frequency embeddings without labels, using losses $\mathcal{L}_{\rm T}$, $\mathcal{L}_{\rm F}$, and $\mathcal{L}_{\rm A}$ with temperature $\tau$. Fine-tuning introduces a graph convolutional network to exploit spatial electrode information and a classifier for emotion labels, optimizing $\mathcal{L}_{\rm T}$, $\mathcal{L}_{\rm F}$, $\mathcal{L}_{\rm A}$, and $\mathcal{L}_{\rm cls}$ with corresponding weights. Experiments on SEED and SEED-IV demonstrate state-of-the-art cross-corpus accuracy, with JCFA outperforming the second-best by up to 7.02 percentage points, and ablations confirming the importance of each component and the time-frequency alignment. The approach offers a practical, data-efficient path to robust brain-signal emotion recognition, with potential extension to other cross-corpus EEG tasks and real-world HCI applications.
Abstract
The integration of human emotions into multimedia applications shows great potential for enriching user experiences and enhancing engagement across various digital platforms. Unlike traditional methods such as questionnaires, facial expressions, and voice analysis, brain signals offer a more direct and objective understanding of emotional states. However, in the field of electroencephalography (EEG)-based emotion recognition, previous studies have primarily concentrated on training and testing EEG models within a single dataset, overlooking the variability across different datasets. This oversight leads to significant performance degradation when applying EEG models to cross-corpus scenarios. In this study, we propose a novel Joint Contrastive learning framework with Feature Alignment (JCFA) to address cross-corpus EEG-based emotion recognition. The JCFA model operates in two main stages. In the pre-training stage, a joint domain contrastive learning strategy is introduced to characterize generalizable time-frequency representations of EEG signals, without the use of labeled data. It extracts robust time-based and frequency-based embeddings for each EEG sample, and then aligns them within a shared latent time-frequency space. In the fine-tuning stage, JCFA is refined in conjunction with downstream tasks, where the structural connections among brain electrodes are considered. The model capability could be further enhanced for the application in emotion detection and interpretation. Extensive experimental results on two well-recognized emotional datasets show that the proposed JCFA model achieves state-of-the-art (SOTA) performance, outperforming the second-best method by an average accuracy increase of 4.09% in cross-corpus EEG-based emotion recognition tasks.
