Table of Contents
Fetching ...

FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping

Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, Fang Wen

TL;DR

FaceShifter tackles high-fidelity, occlusion-aware face swapping with a two-stage approach. The first stage uses AEI-Net to adaptively fuse source identity with multi-level target attributes via Adaptive Attentional Denormalization, while the second stage HEAR-Net self-supervisedly refines occlusion regions guided by heuristic reconstruction errors. The method demonstrates superior identity preservation and faithful attribute rendering on FaceForensics++ and real-world images, along with robust occlusion recovery without manual annotations. Overall, FaceShifter offers a subject-agnostic, two-stage solution that outperforms prior methods in realism, fidelity, and occlusion handling.

Abstract

In this work, we propose a novel two-stage framework, called FaceShifter, for high fidelity and occlusion aware face swapping. Unlike many existing face swapping works that leverage only limited information from the target image when synthesizing the swapped face, our framework, in its first stage, generates the swapped face in high-fidelity by exploiting and integrating the target attributes thoroughly and adaptively. We propose a novel attributes encoder for extracting multi-level target face attributes, and a new generator with carefully designed Adaptive Attentional Denormalization (AAD) layers to adaptively integrate the identity and the attributes for face synthesis. To address the challenging facial occlusions, we append a second stage consisting of a novel Heuristic Error Acknowledging Refinement Network (HEAR-Net). It is trained to recover anomaly regions in a self-supervised way without any manual annotations. Extensive experiments on wild faces demonstrate that our face swapping results are not only considerably more perceptually appealing, but also better identity preserving in comparison to other state-of-the-art methods.

FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping

TL;DR

FaceShifter tackles high-fidelity, occlusion-aware face swapping with a two-stage approach. The first stage uses AEI-Net to adaptively fuse source identity with multi-level target attributes via Adaptive Attentional Denormalization, while the second stage HEAR-Net self-supervisedly refines occlusion regions guided by heuristic reconstruction errors. The method demonstrates superior identity preservation and faithful attribute rendering on FaceForensics++ and real-world images, along with robust occlusion recovery without manual annotations. Overall, FaceShifter offers a subject-agnostic, two-stage solution that outperforms prior methods in realism, fidelity, and occlusion handling.

Abstract

In this work, we propose a novel two-stage framework, called FaceShifter, for high fidelity and occlusion aware face swapping. Unlike many existing face swapping works that leverage only limited information from the target image when synthesizing the swapped face, our framework, in its first stage, generates the swapped face in high-fidelity by exploiting and integrating the target attributes thoroughly and adaptively. We propose a novel attributes encoder for extracting multi-level target face attributes, and a new generator with carefully designed Adaptive Attentional Denormalization (AAD) layers to adaptively integrate the identity and the attributes for face synthesis. To address the challenging facial occlusions, we append a second stage consisting of a novel Heuristic Error Acknowledging Refinement Network (HEAR-Net). It is trained to recover anomaly regions in a self-supervised way without any manual annotations. Extensive experiments on wild faces demonstrate that our face swapping results are not only considerably more perceptually appealing, but also better identity preserving in comparison to other state-of-the-art methods.

Paper Structure

This paper contains 12 sections, 15 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: The face in the source image is taken to replace the face in the target image. Results of FaceShifter appear in the right.
  • Figure 2: Failure cases of a previous method on FaceForensics++ rossler2019faceforensics++ dataset. From left to right we show the input source images, the input target images, the results of FaceSwap faceswap, and the results of our method. FaceSwap follows the strategy that, first synthesizes the inner face region, then blends it into the target face. Such strategy causes artifacts, such as the defective lighting effect on the nose (row 1), failing to preserve the face shape of the source identity (row 2) and the mismatched image resolutions (row 3). While our method addresses all these issues.
  • Figure 3: AEI-Net for the first stage. AEI-Net is composed of an Identity Encoder, a Multi-level Attributes Encoder, and an AAD-Generator. The AAD-Generator integrates informations of identity and attributes in multiple feature levels using cascaded AAD ResBlks, which is built on AAD layers.
  • Figure 4: HEAR-Net for the second stage. $\hat{Y}_{t,t}$ is the reconstruction of the target image $X_t$, i.e., $\hat{Y}_{t,t}=\texttt{AEI-Net}(X_t, X_t)$. $\hat{Y}_{s,t}$ is the swapped face from the first stage.
  • Figure 5: Comparison with FaceSwap faceswap, Nirkin et al.nirkin2018face, DeepFakes deepfake, IPGAN Bao_ipgan on FaceForensics++ rossler2019faceforensics++ face images. Our results better preserve the face shapes of the source identities, and are also more faithful to the target attributes (e.g. lightings, image resolutions).
  • ...and 8 more figures