Table of Contents
Fetching ...

From Coarse to Fine-Grained Emotion Annotation: An Immediate Recall Paradigm with Validation through Physiological Evidence and Recognition Performance

Hao Tang, Songyun Xie, Xinzhou Xie, Can Liao, Xin Zhang, Bohan Li, Zhongyu Tian, Dalu Zheng

TL;DR

This work addresses the label noise problem in video-induced emotion datasets by introducing an immediate recall paradigm that enables fine-grained, timestamped emotion annotations anchored to the moment of subjective experience. The FIRMED dataset combines synchronized EEG, ECG, GSR, PPG, and facial data with an immediate replay phase to mark discrete event timestamps $t_{event}$ within a precise 4-s window, validated against robust CNS and ANS physiological markers. Results show that models trained on FIRMED labels outperform those trained on traditional whole-trial labels across EEG and multimodal configurations, with notable gains when fusion modalities are used and when the window is centered on $t_{event}$. The findings demonstrate that annotation precision can outweigh data scale in determining emotion recognition performance, and the approach reduces annotation uncertainty compared with delayed recall methods. This paradigm advances ecologically valid emotion labeling and has implications for developing more reliable affective computing systems in real-world settings.

Abstract

Traditional video-induced emotion physiological datasets often use whole-trial annotation, assigning a single emotion label to all data collected during an entire trial. This coarse-grained annotation approach misaligns with the dynamic and temporally localized nature of emotional responses as they unfold with video narratives, introducing label noise that limits emotion recognition algorithm evaluation and performance. To solve the label noise problem caused by coarse-grained annotation, we propose a fine-grained annotation method through an immediate recall paradigm. This paradigm integrates an immediate video replay phase after the initial stimulus viewing, allowing participants to precisely mark the onset timestamp, emotion label, and intensity based on their immediate recall. We validate this paradigm through physiological evidence and recognition performance. Physiological validation of multimodal signals within participant-marked windows revealed rhythm-specific EEG patterns and arousal-dependent GSR responses-with SCRs appearing in 91% of high-arousal versus 6% of low-arousal emotion windows. These objective physiological data changes strongly aligned with subjective annotations, confirming annotation precision. For recognition performance, classification experiments showed that models trained on fine-grained annotations achieved 9.7% higher accuracy than traditional whole-trial labeling, despite using less data. This work not only addresses label noise through fine-grained annotation but also demonstrates that annotation precision outweighs data scale in determining emotion recognition performance.

From Coarse to Fine-Grained Emotion Annotation: An Immediate Recall Paradigm with Validation through Physiological Evidence and Recognition Performance

TL;DR

This work addresses the label noise problem in video-induced emotion datasets by introducing an immediate recall paradigm that enables fine-grained, timestamped emotion annotations anchored to the moment of subjective experience. The FIRMED dataset combines synchronized EEG, ECG, GSR, PPG, and facial data with an immediate replay phase to mark discrete event timestamps within a precise 4-s window, validated against robust CNS and ANS physiological markers. Results show that models trained on FIRMED labels outperform those trained on traditional whole-trial labels across EEG and multimodal configurations, with notable gains when fusion modalities are used and when the window is centered on . The findings demonstrate that annotation precision can outweigh data scale in determining emotion recognition performance, and the approach reduces annotation uncertainty compared with delayed recall methods. This paradigm advances ecologically valid emotion labeling and has implications for developing more reliable affective computing systems in real-world settings.

Abstract

Traditional video-induced emotion physiological datasets often use whole-trial annotation, assigning a single emotion label to all data collected during an entire trial. This coarse-grained annotation approach misaligns with the dynamic and temporally localized nature of emotional responses as they unfold with video narratives, introducing label noise that limits emotion recognition algorithm evaluation and performance. To solve the label noise problem caused by coarse-grained annotation, we propose a fine-grained annotation method through an immediate recall paradigm. This paradigm integrates an immediate video replay phase after the initial stimulus viewing, allowing participants to precisely mark the onset timestamp, emotion label, and intensity based on their immediate recall. We validate this paradigm through physiological evidence and recognition performance. Physiological validation of multimodal signals within participant-marked windows revealed rhythm-specific EEG patterns and arousal-dependent GSR responses-with SCRs appearing in 91% of high-arousal versus 6% of low-arousal emotion windows. These objective physiological data changes strongly aligned with subjective annotations, confirming annotation precision. For recognition performance, classification experiments showed that models trained on fine-grained annotations achieved 9.7% higher accuracy than traditional whole-trial labeling, despite using less data. This work not only addresses label noise through fine-grained annotation but also demonstrates that annotation precision outweighs data scale in determining emotion recognition performance.

Paper Structure

This paper contains 21 sections, 7 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Neuracle NeuroHUB wearable acquisition device wearing method and EEG electrode positions.
  • Figure 2: Experimental paradigm for FIRMED data collection, showing the multi-session design, the procedure within a single trial (including baseline, video stimulus presentation, immediate replay with recall-based discrete annotation, and rest), the multimodal physiological and behavioral signals recorded, and the dimensions of emotion annotation (label, intensity, time).
  • Figure 3: Emotion annotation software interface. Software functions include playing videos, adjusting video progress, recording video progress time, emotion category and emotion intensity.
  • Figure 4: Screenshots of real-time recordings of multisystem physiological responses during a fear experiment, showing baseline, unlabeled data, and labeled "fear" data. EEG, GSR, and ECG show minimal fluctuations during the baseline and unlabeled periods. In contrast, significant and synchronized physiological responses are specifically locked to the subjectively labeled moments of fear ($t_{\text{event}_1}$, $t_{\text{event}_2}$), demonstrating the temporal precision of the annotation method.
  • Figure 5: Event-related EEG PSD change T-maps (4s window) for six emotions across five frequency bands. White circles indicate channels in significant clusters (cluster permutation test, $p< 0.05$). Red/blue indicates PSD increase/decrease. The figure compares neural signatures across emotions.
  • ...and 3 more figures