Table of Contents
Fetching ...

3DFlowRenderer: One-shot Face Re-enactment via Dense 3D Facial Flow Estimation

Siddharth Nijhawan, Takuya Yashima, Tamaki Kojima

TL;DR

The paper tackles the challenge of one-shot face re-enactment under large pose changes by blending 2D warping with 3D geometric reasoning. It introduces a four-stage pipeline (pre-processing, 3D warping, image refinement, and image inpainting) that uses dense 3D facial flow guided by 3DMM priors to warp the source foreground in 3D feature space, with a TransUNet-based refinement and a dedicated background inpainting step. A Cyclic warp loss is proposed to regularize motion estimation, and a two-phase training strategy ensures stable learning of 3D warping, refinement, and inpainting. Evaluations on VoxCeleb show state-of-the-art realism, accurate expression transfer, and faithful background reconstruction, demonstrating robustness to extreme head poses and cross-identity transfers with reduced artifacts.

Abstract

Performing facial expression transfer under one-shot setting has been increasing in popularity among research community with a focus on precise control of expressions. Existing techniques showcase compelling results in perceiving expressions, but they lack robustness with extreme head poses. They also struggle to accurately reconstruct background details, thus hindering the realism. In this paper, we propose a novel warping technology which integrates the advantages of both 2D and 3D methods to achieve robust face re-enactment. We generate dense 3D facial flow fields in feature space to warp an input image based on target expressions without depth information. This enables explicit 3D geometric control for re-enacting misaligned source and target faces. We regularize the motion estimation capability of the 3D flow prediction network through proposed "Cyclic warp loss" by converting warped 3D features back into 2D RGB space. To ensure the generation of finer facial region with natural-background, our framework only renders the facial foreground region first and learns to inpaint the blank area which needs to be filled due to source face translation, thus reconstructing the detailed background without any unwanted pixel motion. Extensive evaluation reveals that our method outperforms state-of-the-art techniques in rendering artifact-free facial images.

3DFlowRenderer: One-shot Face Re-enactment via Dense 3D Facial Flow Estimation

TL;DR

The paper tackles the challenge of one-shot face re-enactment under large pose changes by blending 2D warping with 3D geometric reasoning. It introduces a four-stage pipeline (pre-processing, 3D warping, image refinement, and image inpainting) that uses dense 3D facial flow guided by 3DMM priors to warp the source foreground in 3D feature space, with a TransUNet-based refinement and a dedicated background inpainting step. A Cyclic warp loss is proposed to regularize motion estimation, and a two-phase training strategy ensures stable learning of 3D warping, refinement, and inpainting. Evaluations on VoxCeleb show state-of-the-art realism, accurate expression transfer, and faithful background reconstruction, demonstrating robustness to extreme head poses and cross-identity transfers with reduced artifacts.

Abstract

Performing facial expression transfer under one-shot setting has been increasing in popularity among research community with a focus on precise control of expressions. Existing techniques showcase compelling results in perceiving expressions, but they lack robustness with extreme head poses. They also struggle to accurately reconstruct background details, thus hindering the realism. In this paper, we propose a novel warping technology which integrates the advantages of both 2D and 3D methods to achieve robust face re-enactment. We generate dense 3D facial flow fields in feature space to warp an input image based on target expressions without depth information. This enables explicit 3D geometric control for re-enacting misaligned source and target faces. We regularize the motion estimation capability of the 3D flow prediction network through proposed "Cyclic warp loss" by converting warped 3D features back into 2D RGB space. To ensure the generation of finer facial region with natural-background, our framework only renders the facial foreground region first and learns to inpaint the blank area which needs to be filled due to source face translation, thus reconstructing the detailed background without any unwanted pixel motion. Extensive evaluation reveals that our method outperforms state-of-the-art techniques in rendering artifact-free facial images.
Paper Structure (18 sections, 9 equations, 14 figures, 3 tables)

This paper contains 18 sections, 9 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Qualitative comparisons with state-of-the-art methods in cases of extreme poses. Our method can perform well in both self identity re-enactment (first row) and cross identity re-enactment (second row), specially in cases where target pose is significantly different from source pose.
  • Figure 2: Overview of 3DFlowRenderer.
  • Figure 3: The proposed methodology to compute Cyclic warp loss $\mathcal{L}_{cw}$ during training.
  • Figure 4: The proposed methodology to compute (a) 3D warping loss $\mathcal{L}_{3dw}$ and (b) 3D feature loss $\mathcal{L}_{3df}$ during training.
  • Figure 5: The proposed methodology for training the inpainting network in the first phase.
  • ...and 9 more figures