Table of Contents
Fetching ...

FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing

Xin Lu, Chuanqing Zhuang. Zhengda Lu, Yiqun Wang, Jun Xiao

TL;DR

A frequency-controlled LSTM network is introduced to generate 4D facial expression sequences frame by frame from a given neutral landmark with a given length and a temporal coherence loss is proposed to enhance the perception of temporal sequence motion and improve the accuracy of relative displacements.

Abstract

4D facial expression synthesizing is a critical problem in the fields of computer vision and graphics. Current methods lack flexibility and smoothness when simulating the inter-frame motion of expression sequences. In this paper, we propose a frequency-controlled 4D facial expression synthesizing method, FC-4DFS. Specifically, we introduce a frequency-controlled LSTM network to generate 4D facial expression sequences frame by frame from a given neutral landmark with a given length. Meanwhile, we propose a temporal coherence loss to enhance the perception of temporal sequence motion and improve the accuracy of relative displacements. Furthermore, we designed a Multi-level Identity-Aware Displacement Network based on a cross-attention mechanism to reconstruct the 4D facial expression sequences from landmark sequences. Finally, our FC-4DFS achieves flexible and SOTA generation results of 4D facial expression sequences with different lengths on CoMA and Florence4D datasets. The code will be available on GitHub.

FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing

TL;DR

A frequency-controlled LSTM network is introduced to generate 4D facial expression sequences frame by frame from a given neutral landmark with a given length and a temporal coherence loss is proposed to enhance the perception of temporal sequence motion and improve the accuracy of relative displacements.

Abstract

4D facial expression synthesizing is a critical problem in the fields of computer vision and graphics. Current methods lack flexibility and smoothness when simulating the inter-frame motion of expression sequences. In this paper, we propose a frequency-controlled 4D facial expression synthesizing method, FC-4DFS. Specifically, we introduce a frequency-controlled LSTM network to generate 4D facial expression sequences frame by frame from a given neutral landmark with a given length. Meanwhile, we propose a temporal coherence loss to enhance the perception of temporal sequence motion and improve the accuracy of relative displacements. Furthermore, we designed a Multi-level Identity-Aware Displacement Network based on a cross-attention mechanism to reconstruct the 4D facial expression sequences from landmark sequences. Finally, our FC-4DFS achieves flexible and SOTA generation results of 4D facial expression sequences with different lengths on CoMA and Florence4D datasets. The code will be available on GitHub.
Paper Structure (14 sections, 12 equations, 7 figures, 3 tables)

This paper contains 14 sections, 12 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Quantity generation comparison results between Groud-truth, Motion3D, LM-4DGAN and our FC-LSTM with different identities.
  • Figure 2: The overview of our FC-4DFS framework.
  • Figure 3: The structure of our frequency-controlled LSTM.
  • Figure 4: The structure of our Multi-level Identity-Aware Displacement Net.
  • Figure 5: The qualitative results of sequences generated by Motion3D, LM-4DGAN, our method and the ground-truth of different subjects.
  • ...and 2 more figures