Table of Contents
Fetching ...

End to End Face Reconstruction via Differentiable PnP

Yiren Lu, Huawei Wei

TL;DR

This work designs a two-branch network, whose roles are Face Reconstruction and Face Landmark Detection, and utilizes a differentiable PnP (Perspective-n-Points) layer to finetune the outputs of the two branch.

Abstract

This is a challenge report of the ECCV 2022 WCPA Challenge, Face Reconstruction Track. Inside this report is a brief explanation of how we accomplish this challenge. We design a two-branch network to accomplish this task, whose roles are Face Reconstruction and Face Landmark Detection. The former outputs canonical 3D face coordinates. The latter outputs pixel coordinates, i.e. 2D mapping of 3D coordinates with head pose and perspective projection. In addition, we utilize a differentiable PnP (Perspective-n-Points) layer to finetune the outputs of the two branch. Our method achieves very competitive quantitative results on the MVP-Human dataset and wins a $3^{rd}$ prize in the challenge.

End to End Face Reconstruction via Differentiable PnP

TL;DR

This work designs a two-branch network, whose roles are Face Reconstruction and Face Landmark Detection, and utilizes a differentiable PnP (Perspective-n-Points) layer to finetune the outputs of the two branch.

Abstract

This is a challenge report of the ECCV 2022 WCPA Challenge, Face Reconstruction Track. Inside this report is a brief explanation of how we accomplish this challenge. We design a two-branch network to accomplish this task, whose roles are Face Reconstruction and Face Landmark Detection. The former outputs canonical 3D face coordinates. The latter outputs pixel coordinates, i.e. 2D mapping of 3D coordinates with head pose and perspective projection. In addition, we utilize a differentiable PnP (Perspective-n-Points) layer to finetune the outputs of the two branch. Our method achieves very competitive quantitative results on the MVP-Human dataset and wins a prize in the challenge.
Paper Structure (25 sections, 7 equations, 2 figures, 3 tables)

This paper contains 25 sections, 7 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: This figure shows the overall pipeline of our method. First, the 800$\times$800 image will go through a Coarse Landmark Detector to get coarse landmarks so that we can utilize them to crop the face out. Then the cropped 256$\times$256 image will be fed into a shared backbone with different regressors. After that, a canonical mesh and a 256$\times$256 landmark will be output. Finally, we use the unwarped landmark and the canonical mesh to find out the Rotation and Translation through PnP.
  • Figure 2: Visualization comparison between our method and the ground truth obtained by the structured light sensor of iphone11. We can find that in some extreme cases such as the first column and the fifth column, our method outperforms the ground truth result, which means we have learned the essence of this task.