Table of Contents
Fetching ...

Expressive Speech-driven Facial Animation with controllable emotions

Yutong Chen, Junhong Zhao, Wei-Qiang Zhang

TL;DR

The qualitative and quantitative evaluations show that the animation generated by the novel deep learning-based approach for expressive facial animation generation from speech is rich in facial emotional expressiveness while retaining accurate lip movement, outperforming other state-of-the-art methods.

Abstract

It is in high demand to generate facial animation with high realism, but it remains a challenging task. Existing approaches of speech-driven facial animation can produce satisfactory mouth movement and lip synchronization, but show weakness in dramatic emotional expressions and flexibility in emotion control. This paper presents a novel deep learning-based approach for expressive facial animation generation from speech that can exhibit wide-spectrum facial expressions with controllable emotion type and intensity. We propose an emotion controller module to learn the relationship between the emotion variations (e.g., types and intensity) and the corresponding facial expression parameters. It enables emotion-controllable facial animation, where the target expression can be continuously adjusted as desired. The qualitative and quantitative evaluations show that the animation generated by our method is rich in facial emotional expressiveness while retaining accurate lip movement, outperforming other state-of-the-art methods.

Expressive Speech-driven Facial Animation with controllable emotions

TL;DR

The qualitative and quantitative evaluations show that the animation generated by the novel deep learning-based approach for expressive facial animation generation from speech is rich in facial emotional expressiveness while retaining accurate lip movement, outperforming other state-of-the-art methods.

Abstract

It is in high demand to generate facial animation with high realism, but it remains a challenging task. Existing approaches of speech-driven facial animation can produce satisfactory mouth movement and lip synchronization, but show weakness in dramatic emotional expressions and flexibility in emotion control. This paper presents a novel deep learning-based approach for expressive facial animation generation from speech that can exhibit wide-spectrum facial expressions with controllable emotion type and intensity. We propose an emotion controller module to learn the relationship between the emotion variations (e.g., types and intensity) and the corresponding facial expression parameters. It enables emotion-controllable facial animation, where the target expression can be continuously adjusted as desired. The qualitative and quantitative evaluations show that the animation generated by our method is rich in facial emotional expressiveness while retaining accurate lip movement, outperforming other state-of-the-art methods.
Paper Structure (17 sections, 12 equations, 5 figures, 3 tables)

This paper contains 17 sections, 12 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: An overview of our pipeline.
  • Figure 2: Training/inference pipeline of our proposed emotion control module.
  • Figure 3: Results of emotion animation from the same speech with various emotion classes and customized intensities. The texture is derived from the ground truth video for visualization. Left: animation frames of happiness; Right: animation frames of fear. From top to bottom, the intensities are 1.0, 0.75, and 0.5.
  • Figure 4: Qualitative state-of-the-art comparisons in angry animation. The sentence "Dogs are sitting by the door" does not come up in our training dataset. Different animation frames were selected from the sentence's beginning (top), intermediate (middle), and end (bottom) phases. See the submitted video for more visualizations.
  • Figure 5: Results of EMOCA face geometry reconstruction.