Zero-shot Face Editing via ID-Attribute Decoupled Inversion
Yang Hou, Minggu Wang, Jianjun Zhao
TL;DR
Addresses the challenge of preserving identity and structural fidelity in face editing with diffusion-based methods. It proposes ID-Attribute Decoupled Inversion, which splits identity features via an entire face embedding and attributes via text embeddings, jointly guiding both inversion and reverse diffusion. A 69,900-pair face-attribute dataset and LoRA-based fine-tuning of Stable Diffusion enable zero-shot editing with prompts alone, without region masks. Empirical results on FFHQ/CelebA-HQ show superior ID preservation, structural consistency, and editing quality compared with state-of-the-art baselines, with editing speed comparable to DDIM.
Abstract
Recent advancements in text-guided diffusion models have shown promise for general image editing via inversion techniques, but often struggle to maintain ID and structural consistency in real face editing tasks. To address this limitation, we propose a zero-shot face editing method based on ID-Attribute Decoupled Inversion. Specifically, we decompose the face representation into ID and attribute features, using them as joint conditions to guide both the inversion and the reverse diffusion processes. This allows independent control over ID and attributes, ensuring strong ID preservation and structural consistency while enabling precise facial attribute manipulation. Our method supports a wide range of complex multi-attribute face editing tasks using only text prompts, without requiring region-specific input, and operates at a speed comparable to DDIM inversion. Comprehensive experiments demonstrate its practicality and effectiveness.
