Table of Contents
Fetching ...

Joint Contrastive Learning with Feature Alignment for Cross-Corpus EEG-based Emotion Recognition

Qile Liu, Zhihao Zhou, Jiyuan Wang, Zhen Liang

TL;DR

This paper tackles cross-corpus EEG-based emotion recognition by proposing JCFA, a two-stage framework combining self-supervised joint time-frequency contrastive learning with graph-based supervised fine-tuning. In pre-training, JCFA learns robust time-domain, frequency-domain, and time-frequency embeddings without labels, using losses $\mathcal{L}_{\rm T}$, $\mathcal{L}_{\rm F}$, and $\mathcal{L}_{\rm A}$ with temperature $\tau$. Fine-tuning introduces a graph convolutional network to exploit spatial electrode information and a classifier for emotion labels, optimizing $\mathcal{L}_{\rm T}$, $\mathcal{L}_{\rm F}$, $\mathcal{L}_{\rm A}$, and $\mathcal{L}_{\rm cls}$ with corresponding weights. Experiments on SEED and SEED-IV demonstrate state-of-the-art cross-corpus accuracy, with JCFA outperforming the second-best by up to 7.02 percentage points, and ablations confirming the importance of each component and the time-frequency alignment. The approach offers a practical, data-efficient path to robust brain-signal emotion recognition, with potential extension to other cross-corpus EEG tasks and real-world HCI applications.

Abstract

The integration of human emotions into multimedia applications shows great potential for enriching user experiences and enhancing engagement across various digital platforms. Unlike traditional methods such as questionnaires, facial expressions, and voice analysis, brain signals offer a more direct and objective understanding of emotional states. However, in the field of electroencephalography (EEG)-based emotion recognition, previous studies have primarily concentrated on training and testing EEG models within a single dataset, overlooking the variability across different datasets. This oversight leads to significant performance degradation when applying EEG models to cross-corpus scenarios. In this study, we propose a novel Joint Contrastive learning framework with Feature Alignment (JCFA) to address cross-corpus EEG-based emotion recognition. The JCFA model operates in two main stages. In the pre-training stage, a joint domain contrastive learning strategy is introduced to characterize generalizable time-frequency representations of EEG signals, without the use of labeled data. It extracts robust time-based and frequency-based embeddings for each EEG sample, and then aligns them within a shared latent time-frequency space. In the fine-tuning stage, JCFA is refined in conjunction with downstream tasks, where the structural connections among brain electrodes are considered. The model capability could be further enhanced for the application in emotion detection and interpretation. Extensive experimental results on two well-recognized emotional datasets show that the proposed JCFA model achieves state-of-the-art (SOTA) performance, outperforming the second-best method by an average accuracy increase of 4.09% in cross-corpus EEG-based emotion recognition tasks.

Joint Contrastive Learning with Feature Alignment for Cross-Corpus EEG-based Emotion Recognition

TL;DR

This paper tackles cross-corpus EEG-based emotion recognition by proposing JCFA, a two-stage framework combining self-supervised joint time-frequency contrastive learning with graph-based supervised fine-tuning. In pre-training, JCFA learns robust time-domain, frequency-domain, and time-frequency embeddings without labels, using losses , , and with temperature . Fine-tuning introduces a graph convolutional network to exploit spatial electrode information and a classifier for emotion labels, optimizing , , , and with corresponding weights. Experiments on SEED and SEED-IV demonstrate state-of-the-art cross-corpus accuracy, with JCFA outperforming the second-best by up to 7.02 percentage points, and ablations confirming the importance of each component and the time-frequency alignment. The approach offers a practical, data-efficient path to robust brain-signal emotion recognition, with potential extension to other cross-corpus EEG tasks and real-world HCI applications.

Abstract

The integration of human emotions into multimedia applications shows great potential for enriching user experiences and enhancing engagement across various digital platforms. Unlike traditional methods such as questionnaires, facial expressions, and voice analysis, brain signals offer a more direct and objective understanding of emotional states. However, in the field of electroencephalography (EEG)-based emotion recognition, previous studies have primarily concentrated on training and testing EEG models within a single dataset, overlooking the variability across different datasets. This oversight leads to significant performance degradation when applying EEG models to cross-corpus scenarios. In this study, we propose a novel Joint Contrastive learning framework with Feature Alignment (JCFA) to address cross-corpus EEG-based emotion recognition. The JCFA model operates in two main stages. In the pre-training stage, a joint domain contrastive learning strategy is introduced to characterize generalizable time-frequency representations of EEG signals, without the use of labeled data. It extracts robust time-based and frequency-based embeddings for each EEG sample, and then aligns them within a shared latent time-frequency space. In the fine-tuning stage, JCFA is refined in conjunction with downstream tasks, where the structural connections among brain electrodes are considered. The model capability could be further enhanced for the application in emotion detection and interpretation. Extensive experimental results on two well-recognized emotional datasets show that the proposed JCFA model achieves state-of-the-art (SOTA) performance, outperforming the second-best method by an average accuracy increase of 4.09% in cross-corpus EEG-based emotion recognition tasks.
Paper Structure (32 sections, 7 equations, 6 figures, 6 tables, 2 algorithms)

This paper contains 32 sections, 7 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: The overall architecture of the proposed JCFA model for cross-corpus EEG-based emotion recognition. JCFA consists of two stages: (1) joint contrastive learning-based self-supervised pre-training stage, and (2) graph convolutional network-based supervised fine-tuning stage.
  • Figure 2: Confusion matrices of E$^2$STN (the second-best method) and the JCFA model for cross-corpus EEG-based emotion recognition on the SEED and SEED-IV datasets.
  • Figure 3: t-SNE visualization of embeddings in the latent time-frequency space. Circles, triangles and pentagrams denote negative, neutral and positive emotions. Green and blue represent the time embeddings and frequency embeddings. The sample marked with a red line is the same sample in two subplots. The dashed line indicates the distance between time- and frequency-based embeddings of the same sample.
  • Figure 4: Comparison of model performance on the SEED and SEED-IV datasets using different distance metrics.
  • Figure A-1: Heat map of the learned node features in $G$.
  • ...and 1 more figures