Table of Contents
Fetching ...

HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping

Yuhan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, Rongrong Ji

TL;DR

HifiFace tackles high-fidelity face swapping by enforcing 3D shape-aware identity through 3DMM-based shape supervision and by blending encoder–decoder features with a Semantic Facial Fusion module. The 3D shape-aware identity extractor fuses source identity with target expression and pose to preserve geometry, while SFF enables realistic texturing and occlusion handling without compromising identity. A dual loss system—3D shape-aware identity loss and a comprehensive Realism loss—drives both geometry fidelity and photorealism. Empirical results on wild faces show superior face shape preservation and image realism compared with state-of-the-art methods, highlighting the method’s potential for robust face manipulation and forgery detection contexts.

Abstract

In this work, we propose a high fidelity face swapping method, called HifiFace, which can well preserve the face shape of the source face and generate photo-realistic results. Unlike other existing face swapping works that only use face recognition model to keep the identity similarity, we propose 3D shape-aware identity to control the face shape with the geometric supervision from 3DMM and 3D face reconstruction method. Meanwhile, we introduce the Semantic Facial Fusion module to optimize the combination of encoder and decoder features and make adaptive blending, which makes the results more photo-realistic. Extensive experiments on faces in the wild demonstrate that our method can preserve better identity, especially on the face shape, and can generate more photo-realistic results than previous state-of-the-art methods.

HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping

TL;DR

HifiFace tackles high-fidelity face swapping by enforcing 3D shape-aware identity through 3DMM-based shape supervision and by blending encoder–decoder features with a Semantic Facial Fusion module. The 3D shape-aware identity extractor fuses source identity with target expression and pose to preserve geometry, while SFF enables realistic texturing and occlusion handling without compromising identity. A dual loss system—3D shape-aware identity loss and a comprehensive Realism loss—drives both geometry fidelity and photorealism. Empirical results on wild faces show superior face shape preservation and image realism compared with state-of-the-art methods, highlighting the method’s potential for robust face manipulation and forgery detection contexts.

Abstract

In this work, we propose a high fidelity face swapping method, called HifiFace, which can well preserve the face shape of the source face and generate photo-realistic results. Unlike other existing face swapping works that only use face recognition model to keep the identity similarity, we propose 3D shape-aware identity to control the face shape with the geometric supervision from 3DMM and 3D face reconstruction method. Meanwhile, we introduce the Semantic Facial Fusion module to optimize the combination of encoder and decoder features and make adaptive blending, which makes the results more photo-realistic. Extensive experiments on faces in the wild demonstrate that our method can preserve better identity, especially on the face shape, and can generate more photo-realistic results than previous state-of-the-art methods.

Paper Structure

This paper contains 21 sections, 13 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Face swapping results generated by our HifiFace. The face in the target image is replaced by the face in the source image.
  • Figure 2: The pipelines of previous works and our HifiFace. (a) Source-oriented pipeline uses $3$D fitting or reenactment to generate inner face region and blend it into the target image, in which $\textit{F}_r$ means the face region of the result. (b) Target-oriented pipeline uses a face recognition network to exact identity and combines encoder feature with identity in the decoder. (c) Our pipeline consists of four parts: the Encoder part, Decoder part, $3$D shape-aware identity extractor, and SFF module. The encoder extracts features from $I_t$, and the decoder fuses the encoder feature and the $3$D shape-aware identity feature. Finally, the SFF module helps further improve the image quality.
  • Figure 3: Details of $3$D shape-aware identity extractor and SFF module. (a) $3$D shape-aware identity extractor uses $\boldsymbol{F}_{3d}$ ($3$D face reconstruction network) and $\boldsymbol{F}_{id}$ (face recognition network) to generate shape-aware identity. (b) SFF module recombines the encoder and decoder feature by $\boldsymbol{M}_{low}$ and makes the final blending by $\boldsymbol{M}_{r}$. The $\boldsymbol{F}_{up}$ means the upsample Module.
  • Figure 4: Comparison with FSGAN, SimSwap and FaceShifter. Our results can well preserve the source face shape, target attributes and have higher image quality, even when handling occlusion cases.
  • Figure 5: (a) Comparison with AOT. (b) Comparison with DF.
  • ...and 9 more figures