Learning Feature-Preserving Portrait Editing from Generated Pairs
Bowei Chen, Tiancheng Zhi, Peihao Zhu, Shen Sang, Jing Liu, Linjie Luo
TL;DR
The paper tackles portrait editing with the challenge of preserving user identity while applying edits. It proposes a low-cost data-generation pipeline that produces aligned input-target pairs and trains a Multi-Conditioned Diffusion Model that fuses multiple conditioning signals to learn editing directions and guard against unwanted feature changes; a mask-guided inference step further protects subject details. Key contributions include the conditional data generation strategy, the MCDM architecture with spatial, text, and image conditioning, and the automatic editing mask that guides inference. Experiments on costume and cartoon-expression editing show quantitative and user-study evidence of state-of-the-art quality and feature preservation, with ablations highlighting the importance of each component. The approach offers a practical, scalable solution for high-quality, feature-preserving portrait edits with potential applications in real-world editing pipelines.
Abstract
Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity. In this paper, we propose a training-based method leveraging auto-generated paired data to learn desired editing while ensuring the preservation of unchanged subject features. Specifically, we design a data generation process to create reasonably good training pairs for desired editing at low cost. Based on these pairs, we introduce a Multi-Conditioned Diffusion Model to effectively learn the editing direction and preserve subject features. During inference, our model produces accurate editing mask that can guide the inference process to further preserve detailed subject features. Experiments on costume editing and cartoon expression editing show that our method achieves state-of-the-art quality, quantitatively and qualitatively.
