SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization
Xiaojing Zhong, Xinyi Huang, Zhonghua Wu, Guosheng Lin, Qingyao Wu
TL;DR
SARA addresses the challenge of controllable makeup transfer under large spatial misalignment by introducing three coordinated modules: a semantic-guided alignment module (SAM) that builds dense correspondences via unbalanced optimal transport to warp makeup styles, a region-adaptive normalization module (RAM) that decouples shape and makeup using region-specific style codes, and a makeup fusion module (MFM) that progressively fuses identity features with the warped makeup. The framework supports partial makeup transfer and shade-controllable transfer, and can perform makeup removal by reversing the transfer with a non makeup reference. The authors propose loss functions that combine perceptual, makeup, cycle, adversarial, and identity constraints, and use pseudo ground truth generated from OT matching for supervision. Experiments on the Makeup Transfer dataset and M-wild dataset show that SARA achieves state-of-the-art results in terms of transfer fidelity, sharpness, and controllability, while preserving identity across challenging misalignments and occlusions.
Abstract
Makeup transfer is a process of transferring the makeup style from a reference image to the source images, while preserving the source images' identities. This technique is highly desirable and finds many applications. However, existing methods lack fine-level control of the makeup style, making it challenging to achieve high-quality results when dealing with large spatial misalignments. To address this problem, we propose a novel Spatial Alignment and Region-Adaptive normalization method (SARA) in this paper. Our method generates detailed makeup transfer results that can handle large spatial misalignments and achieve part-specific and shade-controllable makeup transfer. Specifically, SARA comprises three modules: Firstly, a spatial alignment module that preserves the spatial context of makeup and provides a target semantic map for guiding the shape-independent style codes. Secondly, a region-adaptive normalization module that decouples shape and makeup style using per-region encoding and normalization, which facilitates the elimination of spatial misalignments. Lastly, a makeup fusion module blends identity features and makeup style by injecting learned scale and bias parameters. Experimental results show that our SARA method outperforms existing methods and achieves state-of-the-art performance on two public datasets.
