Table of Contents
Fetching ...

Exploring Emotion Expression Recognition in Older Adults Interacting with a Virtual Coach

Cristina Palmero, Mikel deVelasco, Mohamed Amine Hmani, Aymen Mtibaa, Leila Ben Letaifa, Pau Buch-Cardona, Raquel Justo, Terry Amorese, Eduardo González-Fraile, Begoña Fernández-Ruanova, Jofre Tenorio-Laranga, Anna Torp Johansen, Micaela Rodrigues da Silva, Liva Jenny Martinussen, Maria Stylianou Korsnes, Gennaro Cordasco, Anna Esposito, Mounim A. El-Yacoubi, Dijana Petrovska-Delacrétaz, M. Inés Torres, Sergio Escalera

TL;DR

The development of the emotion expression recognition module of the virtual coach is outlined, encompassing data collection, annotation design, and a first methodological approach, all tailored to the project requirements, to contribute to the limited literature on emotion recognition applied to older adults in conversational human-machine interaction.

Abstract

The EMPATHIC project aimed to design an emotionally expressive virtual coach capable of engaging healthy seniors to improve well-being and promote independent aging. One of the core aspects of the system is its human sensing capabilities, allowing for the perception of emotional states to provide a personalized experience. This paper outlines the development of the emotion expression recognition module of the virtual coach, encompassing data collection, annotation design, and a first methodological approach, all tailored to the project requirements. With the latter, we investigate the role of various modalities, individually and combined, for discrete emotion expression recognition in this context: speech from audio, and facial expressions, gaze, and head dynamics from video. The collected corpus includes users from Spain, France, and Norway, and was annotated separately for the audio and video channels with distinct emotional labels, allowing for a performance comparison across cultures and label types. Results confirm the informative power of the modalities studied for the emotional categories considered, with multimodal methods generally outperforming others (around 68% accuracy with audio labels and 72-74% with video labels). The findings are expected to contribute to the limited literature on emotion recognition applied to older adults in conversational human-machine interaction.

Exploring Emotion Expression Recognition in Older Adults Interacting with a Virtual Coach

TL;DR

The development of the emotion expression recognition module of the virtual coach is outlined, encompassing data collection, annotation design, and a first methodological approach, all tailored to the project requirements, to contribute to the limited literature on emotion recognition applied to older adults in conversational human-machine interaction.

Abstract

The EMPATHIC project aimed to design an emotionally expressive virtual coach capable of engaging healthy seniors to improve well-being and promote independent aging. One of the core aspects of the system is its human sensing capabilities, allowing for the perception of emotional states to provide a personalized experience. This paper outlines the development of the emotion expression recognition module of the virtual coach, encompassing data collection, annotation design, and a first methodological approach, all tailored to the project requirements. With the latter, we investigate the role of various modalities, individually and combined, for discrete emotion expression recognition in this context: speech from audio, and facial expressions, gaze, and head dynamics from video. The collected corpus includes users from Spain, France, and Norway, and was annotated separately for the audio and video channels with distinct emotional labels, allowing for a performance comparison across cultures and label types. Results confirm the informative power of the modalities studied for the emotional categories considered, with multimodal methods generally outperforming others (around 68% accuracy with audio labels and 72-74% with video labels). The findings are expected to contribute to the limited literature on emotion recognition applied to older adults in conversational human-machine interaction.
Paper Structure (117 sections, 1 equation, 5 figures, 22 tables)

This paper contains 117 sections, 1 equation, 5 figures, 22 tables.

Figures (5)

  • Figure 1: Setup with a participant during an interaction session.
  • Figure 2: Representation of the segmentation of annotated emotion expression categories to create the gold standard for the audio modality. Happy corresponds to pleased/amused.
  • Figure 3: Overview of the methodological pipeline.
  • Figure 4: Per-country audio-based results, training on either SP, FR, NO, or WH training sets, and evaluating on SP, FR and NO test sets. Reported as unweighted avg. accuracy $\pm$ SEM over 10 folds and 3 runs per fold.
  • Figure 5: Per-country video-based results under (a) speech or (b) silence, trained on SP, FR, NO, and WH training sets, and evaluated on SP, FR and NO test sets. Reported as unweighted average accuracy $\pm$ SEM over 10 folds and 3 runs per fold.