SelfSwapper: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder
Jaeseong Lee, Junha Hyung, Sohyun Jeong, Jaegul Choo
TL;DR
SelfSwapper introduces SAMAE, a self-supervised Shape-Agnostic Masked AutoEncoder for face swapping that avoids target identity leakage and improves cross-identity realism. By disentangling identity from non-identity attributes and leveraging 3DMM-based geometry, a foreground mask, and learnable skin/albedo representations, SAMAE enables robust cross-identity swaps; it further mitigates shape misalignment and volume discrepancies via perforation confusion and random mesh scaling. Empirically, SAMAE achieves state-of-the-art performance on standard benchmarks, with strong qualitative results and ablations validating the effectiveness of the proposed techniques. The approach offers a robust, generalizable framework for realistic, privacy-conscious face swapping with reduced leakage and well-preserved target illumination and geometry, while acknowledging ethical considerations and potential future enhancements.
Abstract
Face swapping has gained significant attention for its varied applications. Most previous face swapping approaches have relied on the seesaw game training scheme, also known as the target-oriented approach. However, this often leads to instability in model training and results in undesired samples with blended identities due to the target identity leakage problem. Source-oriented methods achieve more stable training with self-reconstruction objective but often fail to accurately reflect target image's skin color and illumination. This paper introduces the Shape Agnostic Masked AutoEncoder (SAMAE) training scheme, a novel self-supervised approach that combines the strengths of both target-oriented and source-oriented approaches. Our training scheme addresses the limitations of traditional training methods by circumventing the conventional seesaw game and introducing clear ground truth through its self-reconstruction training regime. Our model effectively mitigates identity leakage and reflects target albedo and illumination through learned disentangled identity and non-identity features. Additionally, we closely tackle the shape misalignment and volume discrepancy problems with new techniques, including perforation confusion and random mesh scaling. SAMAE establishes a new state-of-the-art, surpassing other baseline methods, preserving both identity and non-identity attributes without sacrificing on either aspect.
