Table of Contents
Fetching ...

Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

Yihang Dong, Xuhang Chen, Yanyan Shen, Michael Kwok-Po Ng, Tao Qian, Shuqiang Wang

TL;DR

This paper tackles cross-subject EEG emotion recognition by addressing inter-subject variability with a pre-trained EEG encoder and masked brain signal modeling (MBSM). It introduces Mood Reader, a multi-modal architecture that combines DE-based spatio-temporal representations with eye-movement cues, processed through an interlinked spatial-temporal attention module and a multi-level fusion layer. The approach yields state-of-the-art results on SEED and SEED-V datasets and offers biological interpretability via attention visualizations mapped onto electrode locations. The work advances cross-subject affective computing by delivering robust, generalizable representations and insights into emotion-related brain regions.

Abstract

Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing.

Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

TL;DR

This paper tackles cross-subject EEG emotion recognition by addressing inter-subject variability with a pre-trained EEG encoder and masked brain signal modeling (MBSM). It introduces Mood Reader, a multi-modal architecture that combines DE-based spatio-temporal representations with eye-movement cues, processed through an interlinked spatial-temporal attention module and a multi-level fusion layer. The approach yields state-of-the-art results on SEED and SEED-V datasets and offers biological interpretability via attention visualizations mapped onto electrode locations. The work advances cross-subject affective computing by delivering robust, generalizable representations and insights into emotion-related brain regions.

Abstract

Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing.
Paper Structure (22 sections, 9 equations, 4 figures, 1 table)

This paper contains 22 sections, 9 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The overall architecture of our proposed model and the way related data flows in it.
  • Figure 2: Overview of the spatial interlink block, the temporal feature $X_T$ undergos multiple transformations to align with the spatial feature $X_S$, and after concatenation, the interlink process is completed through multi-head attention computation.
  • Figure 3: Attention visualization. We visualized the model's attention weights at different moments during the training process, which allows for an intuitive understanding of how the model's preference for EEG signals monitored by electrodes at various locations evolves over time (b, c, d, e). (a) presents the layout of the utilized 62-channel electrode placement.
  • Figure 4: The results of the ablation studies, conducted through progressive stacking of modules.