Table of Contents
Fetching ...

EEG-based Multimodal Representation Learning for Emotion Recognition

Kang Yin, Hye-Bin Shin, Dan Li, Seong-Whan Lee

TL;DR

A novel multimodal framework that accommodates not only conventional modalities such as video, images, and audio, but also incorporates EEG data, designed to flexibly handle varying input sizes, while dynamically adjusting attention to account for feature importance across modalities.

Abstract

Multimodal learning has been a popular area of research, yet integrating electroencephalogram (EEG) data poses unique challenges due to its inherent variability and limited availability. In this paper, we introduce a novel multimodal framework that accommodates not only conventional modalities such as video, images, and audio, but also incorporates EEG data. Our framework is designed to flexibly handle varying input sizes, while dynamically adjusting attention to account for feature importance across modalities. We evaluate our approach on a recently introduced emotion recognition dataset that combines data from three modalities, making it an ideal testbed for multimodal learning. The experimental results provide a benchmark for the dataset and demonstrate the effectiveness of the proposed framework. This work highlights the potential of integrating EEG into multimodal systems, paving the way for more robust and comprehensive applications in emotion recognition and beyond.

EEG-based Multimodal Representation Learning for Emotion Recognition

TL;DR

A novel multimodal framework that accommodates not only conventional modalities such as video, images, and audio, but also incorporates EEG data, designed to flexibly handle varying input sizes, while dynamically adjusting attention to account for feature importance across modalities.

Abstract

Multimodal learning has been a popular area of research, yet integrating electroencephalogram (EEG) data poses unique challenges due to its inherent variability and limited availability. In this paper, we introduce a novel multimodal framework that accommodates not only conventional modalities such as video, images, and audio, but also incorporates EEG data. Our framework is designed to flexibly handle varying input sizes, while dynamically adjusting attention to account for feature importance across modalities. We evaluate our approach on a recently introduced emotion recognition dataset that combines data from three modalities, making it an ideal testbed for multimodal learning. The experimental results provide a benchmark for the dataset and demonstrate the effectiveness of the proposed framework. This work highlights the potential of integrating EEG into multimodal systems, paving the way for more robust and comprehensive applications in emotion recognition and beyond.

Paper Structure

This paper contains 8 sections, 1 equation, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of the proposed attention-based multimodal emotion recognition framework, extracting EEG, audio, and visual features with specifically tailored transformers and integrating them through self-attention fusion.
  • Figure 2: Barplot illustration of the subject-wise performance across modalities.