Table of Contents
Fetching ...

Neuromorphic Valence and Arousal Estimation

Lorenzo Berlincioni, Luca Cultrera, Federico Becattini, Alberto Del Bimbo

TL;DR

The paper tackles continuous valence and arousal estimation from facial expressions using neuromorphic (event camera) data. It trains multiple frame- and video-based models on a synthetic neuromorphic analogue of the RGB AFEW-VA dataset created via a V2E simulator, enabling fully labeled neuromorphic data without extra annotation. The approach achieves state-of-the-art results on AFEW-VA and demonstrates zero-shot transfer to real event data (NEFER) for emotion recognition, validating both the data-generation pipeline and model generalization. Key contributions include a comparison of frame- and video-based architectures, an analysis of Temporal Binary Representation encoding with varying bit-depth $N$, and a practical zero-shot deployment scenario for neuromorphic affective computing, with potential impact on privacy-preserving, low-latency emotion analysis. The continuous valence-arousal targets are in the range $[-1,1]$, enabling fine-grained mood tracking from high-temporal-resolution event streams.

Abstract

Recognizing faces and their underlying emotions is an important aspect of biometrics. In fact, estimating emotional states from faces has been tackled from several angles in the literature. In this paper, we follow the novel route of using neuromorphic data to predict valence and arousal values from faces. Due to the difficulty of gathering event-based annotated videos, we leverage an event camera simulator to create the neuromorphic counterpart of an existing RGB dataset. We demonstrate that not only training models on simulated data can still yield state-of-the-art results in valence-arousal estimation, but also that our trained models can be directly applied to real data without further training to address the downstream task of emotion recognition. In the paper we propose several alternative models to solve the task, both frame-based and video-based.

Neuromorphic Valence and Arousal Estimation

TL;DR

The paper tackles continuous valence and arousal estimation from facial expressions using neuromorphic (event camera) data. It trains multiple frame- and video-based models on a synthetic neuromorphic analogue of the RGB AFEW-VA dataset created via a V2E simulator, enabling fully labeled neuromorphic data without extra annotation. The approach achieves state-of-the-art results on AFEW-VA and demonstrates zero-shot transfer to real event data (NEFER) for emotion recognition, validating both the data-generation pipeline and model generalization. Key contributions include a comparison of frame- and video-based architectures, an analysis of Temporal Binary Representation encoding with varying bit-depth , and a practical zero-shot deployment scenario for neuromorphic affective computing, with potential impact on privacy-preserving, low-latency emotion analysis. The continuous valence-arousal targets are in the range , enabling fine-grained mood tracking from high-temporal-resolution event streams.

Abstract

Recognizing faces and their underlying emotions is an important aspect of biometrics. In fact, estimating emotional states from faces has been tackled from several angles in the literature. In this paper, we follow the novel route of using neuromorphic data to predict valence and arousal values from faces. Due to the difficulty of gathering event-based annotated videos, we leverage an event camera simulator to create the neuromorphic counterpart of an existing RGB dataset. We demonstrate that not only training models on simulated data can still yield state-of-the-art results in valence-arousal estimation, but also that our trained models can be directly applied to real data without further training to address the downstream task of emotion recognition. In the paper we propose several alternative models to solve the task, both frame-based and video-based.
Paper Structure (12 sections, 5 figures, 5 tables)

This paper contains 12 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Valence-Arousal unit circle. Values can be directly mapped into emotions mikels2005emotional.
  • Figure 2: Illustration of RGB and event frames in a sample video over its relative valence and arousal plot.
  • Figure 3: Qualitative samples for valence and arousal estimation on samples of the AFEW-VA dataset, obtained with the frame-based ResNet+Fusion model.
  • Figure 4: Qualitative samples for valence and arousal estimation on samples of the AFEW-VA dataset, obtained with the frame-based ResNet+Fusion model. Estimated and ground truth valence and arousal are shown as points on the wheel of emotions.
  • Figure 5: Compression artifacts showing after postprocessing on frame samples from AFEW-VA