Table of Contents
Fetching ...

MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition

Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu

TL;DR

This work tackles EEG-based emotion recognition by exploiting multi-domain information (temporal, frequency, and spatial) through a novel multi-view graph transformer (MVGT). MVGT represents EEG data as a graph of channels, using differential entropy-based features for frequency information, segment-level temporal tokens, and three spatial encodings (BRE, CE, GSE) to bias attention, with iterative recycling to refine representations. On SEED and SEED-IV datasets, MVGT achieves state-of-the-art accuracy, with ablation studies showing the temporal embedding and spatial encodings as key drivers of performance and insights into distributed brain network involvement in emotion processing. The approach demonstrates strong potential for robust, interpretable EEG emotion recognition and highlights the value of integrating cross-domain information via graph transformers in affective computing.

Abstract

Electroencephalography (EEG), a technique that records electrical activity from the scalp using electrodes, plays a vital role in affective computing. However, fully utilizing the multi-domain characteristics of EEG signals remains a significant challenge. Traditional single-perspective analyses often fail to capture the complex interplay of temporal, frequency, and spatial dimensions in EEG data. To address this, we introduce a multi-view graph transformer (MVGT) based on spatial relations that integrates information across three domains: temporal dynamics from continuous series, frequency features extracted from frequency bands, and inter-channel relationships captured through several spatial encodings. This comprehensive approach allows model to capture the nuanced properties inherent in EEG signals, enhancing its flexibility and representational power. Evaluation on publicly available datasets demonstrates that MVGT surpasses state-of-the-art methods in performance. The results highlight its ability to extract multi-domain information and effectively model inter-channel relationships, showcasing its potential for EEG-based emotion recognition tasks.

MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition

TL;DR

This work tackles EEG-based emotion recognition by exploiting multi-domain information (temporal, frequency, and spatial) through a novel multi-view graph transformer (MVGT). MVGT represents EEG data as a graph of channels, using differential entropy-based features for frequency information, segment-level temporal tokens, and three spatial encodings (BRE, CE, GSE) to bias attention, with iterative recycling to refine representations. On SEED and SEED-IV datasets, MVGT achieves state-of-the-art accuracy, with ablation studies showing the temporal embedding and spatial encodings as key drivers of performance and insights into distributed brain network involvement in emotion processing. The approach demonstrates strong potential for robust, interpretable EEG emotion recognition and highlights the value of integrating cross-domain information via graph transformers in affective computing.

Abstract

Electroencephalography (EEG), a technique that records electrical activity from the scalp using electrodes, plays a vital role in affective computing. However, fully utilizing the multi-domain characteristics of EEG signals remains a significant challenge. Traditional single-perspective analyses often fail to capture the complex interplay of temporal, frequency, and spatial dimensions in EEG data. To address this, we introduce a multi-view graph transformer (MVGT) based on spatial relations that integrates information across three domains: temporal dynamics from continuous series, frequency features extracted from frequency bands, and inter-channel relationships captured through several spatial encodings. This comprehensive approach allows model to capture the nuanced properties inherent in EEG signals, enhancing its flexibility and representational power. Evaluation on publicly available datasets demonstrates that MVGT surpasses state-of-the-art methods in performance. The results highlight its ability to extract multi-domain information and effectively model inter-channel relationships, showcasing its potential for EEG-based emotion recognition tasks.
Paper Structure (22 sections, 15 equations, 5 figures, 2 tables)

This paper contains 22 sections, 15 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overall structure of MVGT. (a) represents the process of brain region encoding, centrality encoding and geometric structure encoding. (b) depicts the process of calculating inter-channel correlations based on the attention mechanism and geometric structure encoding. "Recycling" refers to the iterative refinement (see \ref{['sec:implementation']}).
  • Figure 2: The brain region division schemes are illustrated. (a) LOBE scheme shows a coarse partitioning based on lobe structures. (b) GENERAL scheme represents a fine-grained partitioning of the brain lobes. (c) FRONTAL scheme introduces symmetry of the left and right frontal regions. (d) HEMISPHERE scheme further enhances the channel symmetry in the partitioning scheme. Channels of the same color belong to the same brain region.
  • Figure 3: Confusion matrices of MVGT. (a) Confusion matrix of MVGT-F on SEED. (b) Confusion matrix of MVGT-G on SEED-IV. Each row of the matrix represents the true labels while each column serves as the predicted labels.
  • Figure 4: The learned inter-channel relationships from the SEED by the MVGT-F and from the SEED-IV by the MVGT-G are illustrated. The figures show the results of the last iteration in the iterative refinement, highlighting the top 10 channel pairs with the highest weights after softmax (darker colors indicate higher weights). Channels of the same brain region are represented in the same color. Rows correspond to attention heads, while columns represent the layers of the MHA mechanism.
  • Figure 5: The temporal approach is illustrated. (a) represents the DE data, (b) represents the time segments obtained by a sliding window, (c) shows the default method of treating multi-channel data at a single time point as a token, while (d) illustrates the "Inverted" method, where the entire continuous time segment is treated as a token.