Table of Contents
Fetching ...

Face Swap via Diffusion Model

Feifei Wang

TL;DR

This work tackles robust face swapping between two portraits by formulating a diffusion-model pipeline that preserves source identity while matching target facial geometry. It combines IP-Adapter for identity-aware encoding, ControlNet for multi-conditional guidance (including canny edges and facial annotations), and Stable Diffusion inpainting, augmented with DreamBooth-LoRA customization and a facial-guidance loss to improve alignment; CodeFormer is used for post-processing restoration. Quantitatively, the method shows improved expression, pose, and shape alignment on CelebA-HQ but incurs a modest drop in identity fidelity compared with a DiffFace baseline. The results demonstrate enhanced controllability and realism in swapped faces, though inpainting naturalness remains a challenge for fully seamless results.

Abstract

This technical report presents a diffusion model based framework for face swapping between two portrait images. The basic framework consists of three components, i.e., IP-Adapter, ControlNet, and Stable Diffusion's inpainting pipeline, for face feature encoding, multi-conditional generation, and face inpainting respectively. Besides, I introduce facial guidance optimization and CodeFormer based blending to further improve the generation quality. Specifically, we engage a recent light-weighted customization method (i.e., DreamBooth-LoRA), to guarantee the identity consistency by 1) using a rare identifier "sks" to represent the source identity, and 2) injecting the image features of source portrait into each cross-attention layer like the text features. Then I resort to the strong inpainting ability of Stable Diffusion, and utilize canny image and face detection annotation of the target portrait as the conditions, to guide ContorlNet's generation and align source portrait with the target portrait. To further correct face alignment, we add the facial guidance loss to optimize the text embedding during the sample generation. The code is available at: https://github.com/somuchtome/Faceswap

Face Swap via Diffusion Model

TL;DR

This work tackles robust face swapping between two portraits by formulating a diffusion-model pipeline that preserves source identity while matching target facial geometry. It combines IP-Adapter for identity-aware encoding, ControlNet for multi-conditional guidance (including canny edges and facial annotations), and Stable Diffusion inpainting, augmented with DreamBooth-LoRA customization and a facial-guidance loss to improve alignment; CodeFormer is used for post-processing restoration. Quantitatively, the method shows improved expression, pose, and shape alignment on CelebA-HQ but incurs a modest drop in identity fidelity compared with a DiffFace baseline. The results demonstrate enhanced controllability and realism in swapped faces, though inpainting naturalness remains a challenge for fully seamless results.

Abstract

This technical report presents a diffusion model based framework for face swapping between two portrait images. The basic framework consists of three components, i.e., IP-Adapter, ControlNet, and Stable Diffusion's inpainting pipeline, for face feature encoding, multi-conditional generation, and face inpainting respectively. Besides, I introduce facial guidance optimization and CodeFormer based blending to further improve the generation quality. Specifically, we engage a recent light-weighted customization method (i.e., DreamBooth-LoRA), to guarantee the identity consistency by 1) using a rare identifier "sks" to represent the source identity, and 2) injecting the image features of source portrait into each cross-attention layer like the text features. Then I resort to the strong inpainting ability of Stable Diffusion, and utilize canny image and face detection annotation of the target portrait as the conditions, to guide ContorlNet's generation and align source portrait with the target portrait. To further correct face alignment, we add the facial guidance loss to optimize the text embedding during the sample generation. The code is available at: https://github.com/somuchtome/Faceswap
Paper Structure (11 sections, 2 equations, 5 figures, 1 table)

This paper contains 11 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The pipeline of IP-Adapterye2023ip-adapter
  • Figure 2: Samples generated by ControlNetzhang2023adding v1.1 canny
  • Figure 3: Samples generated by ControlNetMediaPipeFaceface
  • Figure 4: Samples generated by facial guidance loss
  • Figure 5: Quantitative results compared with DiffFace