Table of Contents
Fetching ...

Modelling the Interplay of Eye-Tracking Temporal Dynamics and Personality for Emotion Detection in Face-to-Face Settings

Meisam J. Seikavandi, Jostein Fimland, Fabricio Batista Narcizo, Maria Barrett, Ted Vucurevich, Jesper Bünsow Boldt, Andrew Burke Dittberner, Paolo Burelli

TL;DR

This work tackles dynamic emotion recognition in face-to-face-like settings by distinguishing perceived and felt emotions from a listener perspective. It proposes a personality-aware multimodal architecture that fuses temporal eye-tracking with Big Five traits and contextual stimulus cues from talking-face stimuli. Empirical results with 73 participants show that stimulus cues boost perceived-emotion predictions while personality traits substantially improve felt-emotion recognition, with macro F1 up to 0.58 for felt valence. These findings support a layered BET–TCE framework and point to more personalized, ecologically valid affective computing systems.

Abstract

Accurate recognition of human emotions is critical for adaptive human-computer interaction, yet remains challenging in dynamic, conversation-like settings. This work presents a personality-aware multimodal framework that integrates eye-tracking sequences, Big Five personality traits, and contextual stimulus cues to predict both perceived and felt emotions. Seventy-three participants viewed speech-containing clips from the CREMA-D dataset while providing eye-tracking signals, personality assessments, and emotion ratings. Our neural models captured temporal gaze dynamics and fused them with trait and stimulus information, yielding consistent gains over SVM and literature baselines. Results show that (i) stimulus cues strongly enhance perceived-emotion predictions (macro F1 up to 0.77), while (ii) personality traits provide the largest improvements for felt emotion recognition (macro F1 up to 0.58). These findings highlight the benefit of combining physiological, trait-level, and contextual information to address the inherent subjectivity of emotion. By distinguishing between perceived and felt responses, our approach advances multimodal affective computing and points toward more personalized and ecologically valid emotion-aware systems.

Modelling the Interplay of Eye-Tracking Temporal Dynamics and Personality for Emotion Detection in Face-to-Face Settings

TL;DR

This work tackles dynamic emotion recognition in face-to-face-like settings by distinguishing perceived and felt emotions from a listener perspective. It proposes a personality-aware multimodal architecture that fuses temporal eye-tracking with Big Five traits and contextual stimulus cues from talking-face stimuli. Empirical results with 73 participants show that stimulus cues boost perceived-emotion predictions while personality traits substantially improve felt-emotion recognition, with macro F1 up to 0.58 for felt valence. These findings support a layered BET–TCE framework and point to more personalized, ecologically valid affective computing systems.

Abstract

Accurate recognition of human emotions is critical for adaptive human-computer interaction, yet remains challenging in dynamic, conversation-like settings. This work presents a personality-aware multimodal framework that integrates eye-tracking sequences, Big Five personality traits, and contextual stimulus cues to predict both perceived and felt emotions. Seventy-three participants viewed speech-containing clips from the CREMA-D dataset while providing eye-tracking signals, personality assessments, and emotion ratings. Our neural models captured temporal gaze dynamics and fused them with trait and stimulus information, yielding consistent gains over SVM and literature baselines. Results show that (i) stimulus cues strongly enhance perceived-emotion predictions (macro F1 up to 0.77), while (ii) personality traits provide the largest improvements for felt emotion recognition (macro F1 up to 0.58). These findings highlight the benefit of combining physiological, trait-level, and contextual information to address the inherent subjectivity of emotion. By distinguishing between perceived and felt responses, our approach advances multimodal affective computing and points toward more personalized and ecologically valid emotion-aware systems.

Paper Structure

This paper contains 30 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The experimental setup of a participant seated in front of the monitor with sensors attached.
  • Figure 2: A frame from the CREMA-D dataset cao2014crema used in the experiment.
  • Figure 3: The 9-point scales used to rate emotional arousal and valence.
  • Figure 4: Facial landmarks extracted by OpenFace amos2016openface partitioned into multiple ROIs.
  • Figure 5: Neural network architecture integrating eye-tracking data, environmental variables, personality traits, and stimulus emotion.