Table of Contents
Fetching ...

FuseAnyPart: Diffusion-Driven Facial Parts Swapping via Multiple Reference Images

Zheng Yu, Yaohua Wang, Siying Cui, Aixi Zhang, Wei-Long Zheng, Senzhang Wang

TL;DR

In FuseAnyPart, facial parts from different people are assembled into a complete face in latent space within the Mask-based Fusion Module for fusion within the UNet of the diffusion model to create novel characters.

Abstract

Facial parts swapping aims to selectively transfer regions of interest from the source image onto the target image while maintaining the rest of the target image unchanged. Most studies on face swapping designed specifically for full-face swapping, are either unable or significantly limited when it comes to swapping individual facial parts, which hinders fine-grained and customized character designs. However, designing such an approach specifically for facial parts swapping is challenged by a reasonable multiple reference feature fusion, which needs to be both efficient and effective. To overcome this challenge, FuseAnyPart is proposed to facilitate the seamless "fuse-any-part" customization of the face. In FuseAnyPart, facial parts from different people are assembled into a complete face in latent space within the Mask-based Fusion Module. Subsequently, the consolidated feature is dispatched to the Addition-based Injection Module for fusion within the UNet of the diffusion model to create novel characters. Extensive experiments qualitatively and quantitatively validate the superiority and robustness of FuseAnyPart. Source codes are available at https://github.com/Thomas-wyh/FuseAnyPart.

FuseAnyPart: Diffusion-Driven Facial Parts Swapping via Multiple Reference Images

TL;DR

In FuseAnyPart, facial parts from different people are assembled into a complete face in latent space within the Mask-based Fusion Module for fusion within the UNet of the diffusion model to create novel characters.

Abstract

Facial parts swapping aims to selectively transfer regions of interest from the source image onto the target image while maintaining the rest of the target image unchanged. Most studies on face swapping designed specifically for full-face swapping, are either unable or significantly limited when it comes to swapping individual facial parts, which hinders fine-grained and customized character designs. However, designing such an approach specifically for facial parts swapping is challenged by a reasonable multiple reference feature fusion, which needs to be both efficient and effective. To overcome this challenge, FuseAnyPart is proposed to facilitate the seamless "fuse-any-part" customization of the face. In FuseAnyPart, facial parts from different people are assembled into a complete face in latent space within the Mask-based Fusion Module. Subsequently, the consolidated feature is dispatched to the Addition-based Injection Module for fusion within the UNet of the diffusion model to create novel characters. Extensive experiments qualitatively and quantitatively validate the superiority and robustness of FuseAnyPart. Source codes are available at https://github.com/Thomas-wyh/FuseAnyPart.

Paper Structure

This paper contains 17 sections, 5 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: Results of facial parts swapping using the proposed FuseAnyPart at $512 \times 512$ resolution. The swapped face (central image) is generated by fusing the original face (top-left image) with three facial part reference images (bottom-left, top-right, bottom-right). Notably, FuseAnyPart can seamlessly blend facial parts from multiple reference images with significant differences in appearance, producing high-fidelity and natural-looking swapped faces.
  • Figure 2: Illustration of FuseAnyPart. The process begins with an open-set detector identifying a facial image to obtain various facial part masks. Following this, an image encoder uses these masks and the facial image to derive the corresponding facial part feature. These facial part features and masks are then fed into the Mask-based Fusion Module to piece together a complete face in latent space. Subsequently, the consolidated feature is dispatched to the Addition-based Injection Module for fusion within the UNet of the diffusion model.
  • Figure 3: Qualitative comparison of eyes swapping. Our method produces high-fidelity results that maintain the consistency of facial features while ensuring a natural appearance.
  • Figure 4: Qualitative comparison of multiple facial parts swapping with a single reference face. Our method can naturally replace multiple facial parts of one face with those of another and better preserve both the characteristics and the facial part shape. More results are presented in Fig. \ref{['fig:qua_organ_app']}.
  • Figure 5: Qualitative comparison of multi swapping with multiple reference faces. Our method remains robust to different appearances of various reference facial parts.
  • ...and 9 more figures