High-Fidelity Diffusion Face Swapping with ID-Constrained Facial Conditioning
Dailan He, Xiahong Wang, Shulun Wang, Guanglu Song, Bingqi Ma, Hao Shao, Yu Liu, Hongsheng Li
TL;DR
This work tackles diffusion-based face swapping by addressing the competing demands of preserving a source identity while matching target attributes. It introduces an identity-constrained diffusion framework with decoupled identity and attribute conditioning, implemented via a three-stage training pipeline and a post-training refinement that incorporates identity and adversarial losses, all built atop a latent diffusion foundation. The method leverages ArcFace and DINOv2 for identity cues, SimSwap for attributes, and GLIGEN adapters for injection, achieving state-of-the-art identity similarity and strong attribute fidelity on FFHQ with robust stylized generalization. Quantitative and human evaluations show substantial gains in Fréchet Inception Distance, identity retrieval, and perceived fidelity, validating the effectiveness of decoupled conditioning and staged optimization. The proposed approach not only advances high-fidelity face swapping but also provides a broadly applicable strategy for multi-condition diffusion tasks in conditional image generation.
Abstract
Face swapping aims to seamlessly transfer a source facial identity onto a target while preserving target attributes such as pose and expression. Diffusion models, known for their superior generative capabilities, have recently shown promise in advancing face-swapping quality. This paper addresses two key challenges in diffusion-based face swapping: the prioritized preservation of identity over target attributes and the inherent conflict between identity and attribute conditioning. To tackle these issues, we introduce an identity-constrained attribute-tuning framework for face swapping that first ensures identity preservation and then fine-tunes for attribute alignment, achieved through a decoupled condition injection. We further enhance fidelity by incorporating identity and adversarial losses in a post-training refinement stage. Our proposed identity-constrained diffusion-based face-swapping model outperforms existing methods in both qualitative and quantitative evaluations, demonstrating superior identity similarity and attribute consistency, achieving a new state-of-the-art performance in high-fidelity face swapping.
