MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition
Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu
TL;DR
This work tackles EEG-based emotion recognition by exploiting multi-domain information (temporal, frequency, and spatial) through a novel multi-view graph transformer (MVGT). MVGT represents EEG data as a graph of channels, using differential entropy-based features for frequency information, segment-level temporal tokens, and three spatial encodings (BRE, CE, GSE) to bias attention, with iterative recycling to refine representations. On SEED and SEED-IV datasets, MVGT achieves state-of-the-art accuracy, with ablation studies showing the temporal embedding and spatial encodings as key drivers of performance and insights into distributed brain network involvement in emotion processing. The approach demonstrates strong potential for robust, interpretable EEG emotion recognition and highlights the value of integrating cross-domain information via graph transformers in affective computing.
Abstract
Electroencephalography (EEG), a technique that records electrical activity from the scalp using electrodes, plays a vital role in affective computing. However, fully utilizing the multi-domain characteristics of EEG signals remains a significant challenge. Traditional single-perspective analyses often fail to capture the complex interplay of temporal, frequency, and spatial dimensions in EEG data. To address this, we introduce a multi-view graph transformer (MVGT) based on spatial relations that integrates information across three domains: temporal dynamics from continuous series, frequency features extracted from frequency bands, and inter-channel relationships captured through several spatial encodings. This comprehensive approach allows model to capture the nuanced properties inherent in EEG signals, enhancing its flexibility and representational power. Evaluation on publicly available datasets demonstrates that MVGT surpasses state-of-the-art methods in performance. The results highlight its ability to extract multi-domain information and effectively model inter-channel relationships, showcasing its potential for EEG-based emotion recognition tasks.
