Table of Contents
Fetching ...

Emotion recognition in talking-face videos using persistent entropy and neural networks

Eduardo Paluzo-Hidalgo, Guillermo Aguirre-Carrazana, Rocio Gonzalez-Diaz

TL;DR

This work develops a novel approach, using persistent entropy and neural networks as main tools, to recognise and classify emotions from talking-face videos, by combining audio-signal and image-sequence information to compute a topology signature (a 9-dimensional vector) for each video.

Abstract

The automatic recognition of a person's emotional state has become a very active research field that involves scientists specialized in different areas such as artificial intelligence, computer vision or psychology, among others. Our main objective in this work is to develop a novel approach, using persistent entropy and neural networks as main tools, to recognise and classify emotions from talking-face videos. Specifically, we combine audio-signal and image-sequence information to compute a topology signature(a 9-dimensional vector) for each video. We prove that small changes in the video produce small changes in the signature. These topological signatures are used to feed a neural network to distinguish between the following emotions: neutral, calm, happy, sad, angry, fearful, disgust, and surprised. The results reached are promising and competitive, beating the performance reached in other state-of-the-art works found in the literature.

Emotion recognition in talking-face videos using persistent entropy and neural networks

TL;DR

This work develops a novel approach, using persistent entropy and neural networks as main tools, to recognise and classify emotions from talking-face videos, by combining audio-signal and image-sequence information to compute a topology signature (a 9-dimensional vector) for each video.

Abstract

The automatic recognition of a person's emotional state has become a very active research field that involves scientists specialized in different areas such as artificial intelligence, computer vision or psychology, among others. Our main objective in this work is to develop a novel approach, using persistent entropy and neural networks as main tools, to recognise and classify emotions from talking-face videos. Specifically, we combine audio-signal and image-sequence information to compute a topology signature(a 9-dimensional vector) for each video. We prove that small changes in the video produce small changes in the signature. These topological signatures are used to feed a neural network to distinguish between the following emotions: neutral, calm, happy, sad, angry, fearful, disgust, and surprised. The results reached are promising and competitive, beating the performance reached in other state-of-the-art works found in the literature.

Paper Structure

This paper contains 7 sections, 1 theorem, 8 figures, 2 tables.

Key Result

Lemma 1

The so-called topological signature associated to a given talking-face video is stable in the sense that small changes in the video produce small changes in the signature.

Figures (8)

  • Figure 1: Top: Example of a filtration obtained using the height function $h$ on its vertices. Bottom: Associated $d$-dimensional persistent homology. For example, if $d=1$ then $H_1(C_1)=\mathbb{Z}_2\stackrel{[1]}\longrightarrow H_1(C_2)=\mathbb{Z}_2\stackrel{[1\;0]}\longrightarrow H_1(C_3)=\mathbb{Z}_2\oplus \mathbb{Z}_2$
  • Figure 2: A $3\times 5\times 5\times 2$ feedforward neural network composed of an input layer with 3 neurons, two hidden layers with 5 neurons each, and an output layer with 2 neurons.
  • Figure 3: The landmark points considered in this paper, drawn on a face in one frame of a video from the RAVDESS dataset.
  • Figure 4: The 1-skeleton of the cell complex obtained from an image sequence.
  • Figure 5: Illustration of three of the eight different filtrations considered.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Lemma 1