Table of Contents
Fetching ...

FacialFlowNet: Advancing Facial Optical Flow Estimation with a Diverse Dataset and a Decomposed Model

Jianzhi Lu, Ruian He, Shili Zhou, Weimin Tan, Bo Yan

TL;DR

FFN is proposed, a novel large-scale facial optical flow dataset, and the Decomposed Facial Flow Model (DecFlow), the first method capable of decomposing facial flow, which outperforms existing methods in both synthetic and real-world scenarios, enhancing facial expression analysis.

Abstract

Facial movements play a crucial role in conveying altitude and intentions, and facial optical flow provides a dynamic and detailed representation of it. However, the scarcity of datasets and a modern baseline hinders the progress in facial optical flow research. This paper proposes FacialFlowNet (FFN), a novel large-scale facial optical flow dataset, and the Decomposed Facial Flow Model (DecFlow), the first method capable of decomposing facial flow. FFN comprises 9,635 identities and 105,970 image pairs, offering unprecedented diversity for detailed facial and head motion analysis. DecFlow features a facial semantic-aware encoder and a decomposed flow decoder, excelling in accurately estimating and decomposing facial flow into head and expression components. Comprehensive experiments demonstrate that FFN significantly enhances the accuracy of facial flow estimation across various optical flow methods, achieving up to an 11% reduction in Endpoint Error (EPE) (from 3.91 to 3.48). Moreover, DecFlow, when coupled with FFN, outperforms existing methods in both synthetic and real-world scenarios, enhancing facial expression analysis. The decomposed expression flow achieves a substantial accuracy improvement of 18% (from 69.1% to 82.1%) in micro-expressions recognition. These contributions represent a significant advancement in facial motion analysis and optical flow estimation. Codes and datasets can be found.

FacialFlowNet: Advancing Facial Optical Flow Estimation with a Diverse Dataset and a Decomposed Model

TL;DR

FFN is proposed, a novel large-scale facial optical flow dataset, and the Decomposed Facial Flow Model (DecFlow), the first method capable of decomposing facial flow, which outperforms existing methods in both synthetic and real-world scenarios, enhancing facial expression analysis.

Abstract

Facial movements play a crucial role in conveying altitude and intentions, and facial optical flow provides a dynamic and detailed representation of it. However, the scarcity of datasets and a modern baseline hinders the progress in facial optical flow research. This paper proposes FacialFlowNet (FFN), a novel large-scale facial optical flow dataset, and the Decomposed Facial Flow Model (DecFlow), the first method capable of decomposing facial flow. FFN comprises 9,635 identities and 105,970 image pairs, offering unprecedented diversity for detailed facial and head motion analysis. DecFlow features a facial semantic-aware encoder and a decomposed flow decoder, excelling in accurately estimating and decomposing facial flow into head and expression components. Comprehensive experiments demonstrate that FFN significantly enhances the accuracy of facial flow estimation across various optical flow methods, achieving up to an 11% reduction in Endpoint Error (EPE) (from 3.91 to 3.48). Moreover, DecFlow, when coupled with FFN, outperforms existing methods in both synthetic and real-world scenarios, enhancing facial expression analysis. The decomposed expression flow achieves a substantial accuracy improvement of 18% (from 69.1% to 82.1%) in micro-expressions recognition. These contributions represent a significant advancement in facial motion analysis and optical flow estimation. Codes and datasets can be found.
Paper Structure (21 sections, 6 equations, 9 figures, 5 tables)

This paper contains 21 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: The proposed FacialFlowNet dataset and DecFlow method. (a) FacialFlowNet (FFN) contains frames and optical flow labels with overall facial motion as well as head motion and expression. (b) DecFlow is designed to estimate accurate facial flow and further decompose it into head flow and expression flow. We show the optical flow and error map of GMA Jiang_Campbell_Lu_Li_Hartley_2021 (left) and our model (right). (c) Our method can generalize to real-world datasets like MEAD Wang_Wu_Song_Yang_Wu_Qian_He_Qiao_Loy_2020 and the expression flow can be utilized for downstream analysis.
  • Figure 2: Illustration of various optical flow datasets.
  • Figure 3: The dataset generation pipeline. It takes a UV texture, a set of FLAME parameters, a background image, and camera/light parameters as input, producing video sequences of 5, 10, 15, or 20 frames with corresponding optical flow labels. $I_{t}^{f}$ and $I_{t}^{h}$ represent the $t$th frame in FFN-F and FFN-H respectively. From $I_{t}^{f}$ to $I_{t+1}^{f}$, we can get the facial flow, denoted as $F_{t}^{f}$. And from $I_{t}^{f}$ to $I_{t+1}^{h}$, we can obtain the head flow, indicated as $F_{t}^{h}$. Subtracting $F_{t}^{h}$ from $F_{t}^{f}$ results in the expression flow, denoted as $F_{t}^{e}$.
  • Figure 4: The rendered images with UV-Textures obtained from FFHQ-Norm with different methods.
  • Figure 5: The t-SNE visualization of emotion features for both the AffectNet dataset and our dataset. Our dataset preserves a considerable degree of expression diversity.
  • ...and 4 more figures