Table of Contents
Fetching ...

MAD: A Multimodal and Multi-perspective Affective Dataset with Hierarchical Annotations

Shengwei Guo, Yunqing Qiao, Wenzhan Zhang, Bo Liu, Yong Wang, Guobing Sun

TL;DR

The experimental results demonstrate that MAD supports consistent and comparable performance across both unimodal and multimodal settings, establishing it as a reliable benchmark for emotion recognition and cross-modal affective analysis, and as a valuable resource for studying emotion mechanisms across multiple levels.

Abstract

This work presents MAD (Multimodal Affection Dataset), a multimodal emotion dataset designed for affective computing and neurophysiological modeling. MAD is built upon synchronous collection of diverse physiological signals (EEG, ECG, EOG, EMG, PPG, and BCG) together with tri-view RGB-D facial videos, enabling the observation of emotional dynamics from neural, physiological, and behavioral perspectives. The dataset consists of synchronized recordings from 18 participants and introduces two key contributions. First, it provides temporally aligned multimodal data that jointly capture central neural activity, peripheral physiological responses, and overt facial expressions. Second, it incorporates a three-level emotion annotation framework spanning stimulus elicitation, subjective cognition, and behavioral expression, supporting joint modeling of the full emotion process. To validate the dataset, we conduct systematic benchmark experiments covering intra-subject EEG emotion recognition, cross-subject EEG transfer learning, consistency analysis and emotion classification with cardiac-related signals, multimodal physiological fusion, and multi-view facial emotion recognition. The experimental results demonstrate that MAD supports consistent and comparable performance across both unimodal and multimodal settings, establishing it as a reliable benchmark for emotion recognition and cross-modal affective analysis, and as a valuable resource for studying emotion mechanisms across multiple levels.

MAD: A Multimodal and Multi-perspective Affective Dataset with Hierarchical Annotations

TL;DR

The experimental results demonstrate that MAD supports consistent and comparable performance across both unimodal and multimodal settings, establishing it as a reliable benchmark for emotion recognition and cross-modal affective analysis, and as a valuable resource for studying emotion mechanisms across multiple levels.

Abstract

This work presents MAD (Multimodal Affection Dataset), a multimodal emotion dataset designed for affective computing and neurophysiological modeling. MAD is built upon synchronous collection of diverse physiological signals (EEG, ECG, EOG, EMG, PPG, and BCG) together with tri-view RGB-D facial videos, enabling the observation of emotional dynamics from neural, physiological, and behavioral perspectives. The dataset consists of synchronized recordings from 18 participants and introduces two key contributions. First, it provides temporally aligned multimodal data that jointly capture central neural activity, peripheral physiological responses, and overt facial expressions. Second, it incorporates a three-level emotion annotation framework spanning stimulus elicitation, subjective cognition, and behavioral expression, supporting joint modeling of the full emotion process. To validate the dataset, we conduct systematic benchmark experiments covering intra-subject EEG emotion recognition, cross-subject EEG transfer learning, consistency analysis and emotion classification with cardiac-related signals, multimodal physiological fusion, and multi-view facial emotion recognition. The experimental results demonstrate that MAD supports consistent and comparable performance across both unimodal and multimodal settings, establishing it as a reliable benchmark for emotion recognition and cross-modal affective analysis, and as a valuable resource for studying emotion mechanisms across multiple levels.
Paper Structure (29 sections, 2 equations, 7 figures, 7 tables)

This paper contains 29 sections, 2 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Overview of the experimental pipeline. The left panel illustrates the emotion elicitation stage based on video stimuli (Stimulus Elicitation), the middle panel shows the synchronized acquisition of multimodal physiological and visual signals during stimulus presentation (Multimodal Data Acquisition), and the right panel depicts the hierarchical emotion annotation process, including stimulus-based, self-reported, and expression-based labels (Multi-level Emotion Annotation).
  • Figure 2: Schematic illustration of electrode placement for multimodal physiological recordings. Six channels were recorded using the BIOPAC MP150 system, including one EOG, two ECG, and three EMG channels, with sensor locations indicated on the front and back schematic views.
  • Figure 3: Pipeline of RGB‐D data processing for constructing the multi‐view facial emotion recognition dataset. The procedure includes raw video acquisition, face detection, ROI extraction, emotion classification, filtering of expressive frames, manual labeling, and final dataset generation.
  • Figure 4: Representative waveforms of ECG, PPG, and BCG signals with annotated peaks. Subplots (a)–(c) show ECG, PPG, and BCG signals, respectively, with R-peaks in ECG, Systolic-peaks in PPG, and J-peaks in BCG indicated.The 10-second segments were selected from 30-second synchronous recordings to analyze heart cycle consistency.
  • Figure 5: Comparison of peak-to-peak intervals for ECG, PPG, and BCG signals. The plot shows R-R intervals in ECG, Systolic-Systolic intervals in PPG, and J-J intervals in BCG, illustrating consistent heart cycles across modalities.
  • ...and 2 more figures