EEG Emotion Classification Using an Enhanced Transformer-CNN-BiLSTM Architecture with Dual Attention Mechanisms

S M Rakib UI Karim; Wenyi Lu; Diponkor Bala; Rownak Ara Rasul; Sean Goggins

EEG Emotion Classification Using an Enhanced Transformer-CNN-BiLSTM Architecture with Dual Attention Mechanisms

S M Rakib UI Karim, Wenyi Lu, Diponkor Bala, Rownak Ara Rasul, Sean Goggins

TL;DR

The paper addresses EEG-based emotion recognition, a field challenged by high dimensionality, noise, and subject variability, by introducing an Enhanced Transformer-CNN-BiLSTM architecture with dual attention to model spatial-temporal EEG dynamics. The method is evaluated on a public dataset of $N=2{,}529$ samples with $D=988$ features across three emotion classes, using a rigorous 5-fold cross-validation protocol, achieving a test accuracy of $99.19\%$ and a training/test gap of $0.56\%$, indicating strong generalization. Feature analyses reveal covariance-based features as the most informative, with SHAP attributions corroborating their importance and ablation showing large performance drops when removing covariance information. Overall, the work demonstrates state-of-the-art, robust, and interpretable EEG emotion classification with potential for real-time, clinical deployment and suggests future extensions to multimodal data and cross-subject robustness.

Abstract

Electroencephalography (EEG)-based emotion recognition plays a critical role in affective computing and emerging decision-support systems, yet remains challenging due to high-dimensional, noisy, and subject-dependent signals. This study investigates whether hybrid deep learning architectures that integrate convolutional, recurrent, and attention-based components can improve emotion classification performance and robustness in EEG data. We propose an enhanced hybrid model that combines convolutional feature extraction, bidirectional temporal modeling, and self-attention mechanisms with regularization strategies to mitigate overfitting. Experiments conducted on a publicly available EEG dataset spanning three emotional states (neutral, positive, and negative) demonstrate that the proposed approach achieves state-of-the-art classification performance, significantly outperforming classical machine learning and neural baselines. Statistical tests confirm the robustness of these performance gains under cross-validation. Feature-level analyses further reveal that covariance-based EEG features contribute most strongly to emotion discrimination, highlighting the importance of inter-channel relationships in affective modeling. These findings suggest that carefully designed hybrid architectures can effectively balance predictive accuracy, robustness, and interpretability in EEG-based emotion recognition, with implications for applied affective computing and human-centered intelligent systems.

EEG Emotion Classification Using an Enhanced Transformer-CNN-BiLSTM Architecture with Dual Attention Mechanisms

TL;DR

samples with

features across three emotion classes, using a rigorous 5-fold cross-validation protocol, achieving a test accuracy of

and a training/test gap of

, indicating strong generalization. Feature analyses reveal covariance-based features as the most informative, with SHAP attributions corroborating their importance and ablation showing large performance drops when removing covariance information. Overall, the work demonstrates state-of-the-art, robust, and interpretable EEG emotion classification with potential for real-time, clinical deployment and suggests future extensions to multimodal data and cross-subject robustness.

Abstract

Paper Structure (57 sections, 15 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 57 sections, 15 equations, 10 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Methodology
Dataset Description
Data Preprocessing
Proposed Model Architecture
Training Procedure
Evaluation Metrics
Results
Performance, Robustness, and Comparative Evaluation
Baseline Model Performance
Enhanced Hybrid Architecture Performance
Overall Performance
Per-Class Performance
Statistical Significance Analysis
...and 42 more sections

Figures (10)

Figure 1: Enhanced Transformer-CNN-BiLSTM architecture. The seven-stage pipeline comprises: (1) input processing of EEG features, (2) CNN feature extraction with residual connections, (3) bidirectional LSTM temporal modeling, (4) dual multi-head attention mechanisms, (5) dual pooling strategy, (6) deep fully connected classifier, and (7) softmax output for emotion classification. Key innovations include residual CNN blocks, dual attention layers (16 and 8 heads), and advanced regularization.
Figure 2: Overview of the proposed Enhanced Transformer-CNN-BiLSTM model. The network integrates spatial feature extraction (CNN), temporal sequence modeling (BiLSTM), and dual multi-head attention for importance weighting, followed by a dense classifier for emotion prediction.
Figure 3: Confusion Matrices for Baseline Models Showing Per-Class Classification Performance
Figure 4: Model Performance Comparison with Statistical Significance and Per-Class Analysis
Figure 5: Top 15 Features by Consensus Importance Across Five Methods
...and 5 more figures

EEG Emotion Classification Using an Enhanced Transformer-CNN-BiLSTM Architecture with Dual Attention Mechanisms

TL;DR

Abstract

EEG Emotion Classification Using an Enhanced Transformer-CNN-BiLSTM Architecture with Dual Attention Mechanisms

Authors

TL;DR

Abstract

Table of Contents

Figures (10)