Table of Contents
Fetching ...

DEAP DIVE: Dataset Investigation with Vision transformers for EEG evaluation

Annemarie Hoffsommer, Helen Schneider, Svetlana Pavlitska, J. Marius Zöllner

TL;DR

This work demonstrates that emotion recognition from EEG can be effectively achieved using only a subset of channels by converting channel signals into scaleograms via Continuous Wavelet Transform and classifying with a Vision Transformer. The study shows 12-channel configurations, particularly Emotiv subsets, approaching state-of-the-art accuracy (around 91.5%) with substantially fewer inputs than traditional 32-channel setups, and even single-channel eye-movement signals can yield meaningful predictive performance. It also provides an initial baseline for regression on DEAP using EEG data with a reported RMSE around 0.57–0.98 across configurations, and discusses labeling scheme effects (VAQ vs SAM) and the interpretability challenges of channel contributions. These findings support the feasibility of portable, low-cost EEG systems for affective computing and outline future work on explainable AI and cross-device generalization.

Abstract

Accurately predicting emotions from brain signals has the potential to achieve goals such as improving mental health, human-computer interaction, and affective computing. Emotion prediction through neural signals offers a promising alternative to traditional methods, such as self-assessment and facial expression analysis, which can be subjective or ambiguous. Measurements of the brain activity via electroencephalogram (EEG) provides a more direct and unbiased data source. However, conducting a full EEG is a complex, resource-intensive process, leading to the rise of low-cost EEG devices with simplified measurement capabilities. This work examines how subsets of EEG channels from the DEAP dataset can be used for sufficiently accurate emotion prediction with low-cost EEG devices, rather than fully equipped EEG-measurements. Using Continuous Wavelet Transformation to convert EEG data into scaleograms, we trained a vision transformer (ViT) model for emotion classification. The model achieved over 91,57% accuracy in predicting 4 quadrants (high/low per arousal and valence) with only 12 measuring points (also referred to as channels). Our work shows clearly, that a significant reduction of input channels yields high results compared to state-of-the-art results of 96,9% with 32 channels. Training scripts to reproduce our code can be found here: https://gitlab.kit.edu/kit/aifb/ATKS/public/AutoSMiLeS/DEAP-DIVE.

DEAP DIVE: Dataset Investigation with Vision transformers for EEG evaluation

TL;DR

This work demonstrates that emotion recognition from EEG can be effectively achieved using only a subset of channels by converting channel signals into scaleograms via Continuous Wavelet Transform and classifying with a Vision Transformer. The study shows 12-channel configurations, particularly Emotiv subsets, approaching state-of-the-art accuracy (around 91.5%) with substantially fewer inputs than traditional 32-channel setups, and even single-channel eye-movement signals can yield meaningful predictive performance. It also provides an initial baseline for regression on DEAP using EEG data with a reported RMSE around 0.57–0.98 across configurations, and discusses labeling scheme effects (VAQ vs SAM) and the interpretability challenges of channel contributions. These findings support the feasibility of portable, low-cost EEG systems for affective computing and outline future work on explainable AI and cross-device generalization.

Abstract

Accurately predicting emotions from brain signals has the potential to achieve goals such as improving mental health, human-computer interaction, and affective computing. Emotion prediction through neural signals offers a promising alternative to traditional methods, such as self-assessment and facial expression analysis, which can be subjective or ambiguous. Measurements of the brain activity via electroencephalogram (EEG) provides a more direct and unbiased data source. However, conducting a full EEG is a complex, resource-intensive process, leading to the rise of low-cost EEG devices with simplified measurement capabilities. This work examines how subsets of EEG channels from the DEAP dataset can be used for sufficiently accurate emotion prediction with low-cost EEG devices, rather than fully equipped EEG-measurements. Using Continuous Wavelet Transformation to convert EEG data into scaleograms, we trained a vision transformer (ViT) model for emotion classification. The model achieved over 91,57% accuracy in predicting 4 quadrants (high/low per arousal and valence) with only 12 measuring points (also referred to as channels). Our work shows clearly, that a significant reduction of input channels yields high results compared to state-of-the-art results of 96,9% with 32 channels. Training scripts to reproduce our code can be found here: https://gitlab.kit.edu/kit/aifb/ATKS/public/AutoSMiLeS/DEAP-DIVE.

Paper Structure

This paper contains 19 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Concept diagram for exemplary approach with 4 channels. Each input consists of only 4 channels with a recording length of one video watched by one person. Each channel is transformed into a separate scaleogram through a CWT. All 4 scaleograms are used as input for the ViT.
  • Figure 2: Russell's Circumplex Model of Affect dabas_emotion_2018 showing four quadrants(Q) used for our classification.
  • Figure 3: International 10-20-System for EEG-channel placement modified from oostenveld_five_2001 to highlight channels recorded in DEAP.
  • Figure 4: EEG Placement and the corresponding brain region maria_volodina_nikolai_smetanin_cortical_2021.
  • Figure 5: Classification experiments with VAQ labels - labels given by the experimenters. Results of Table \ref{['tab:Ungrouped_singles']}, \ref{['tab:ungrouped SAM']} and \ref{['tab:ungrouped_experiments_Buchstaben']} are presented visually by their 5 folds locality, spread and skewness through their quartiles. Experiments are ordered by descending mean of their 5 folds accuracy($\uparrow$) respectively. The mean (Ø) is shown after the experiment name in the labels of the horizontal axis. The dashed red line indicates the threshold of the double of random results. The data shows the first 6 experiments clearly over the threshold line in all 5 folds respectively. Outliers that differ significantly from the other folds are depicted as circles. Interestingly, the grouped "O"-channels perform only slightly below the threshold with a mean of 44.94% even though the group only contains 3 channels. The Emotiv-channels perform best with a mean of 90,3% in accuracy with only 12 channels as input.
  • ...and 3 more figures