SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization

Xiaojing Zhong; Xinyi Huang; Zhonghua Wu; Guosheng Lin; Qingyao Wu

SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization

Xiaojing Zhong, Xinyi Huang, Zhonghua Wu, Guosheng Lin, Qingyao Wu

TL;DR

SARA addresses the challenge of controllable makeup transfer under large spatial misalignment by introducing three coordinated modules: a semantic-guided alignment module (SAM) that builds dense correspondences via unbalanced optimal transport to warp makeup styles, a region-adaptive normalization module (RAM) that decouples shape and makeup using region-specific style codes, and a makeup fusion module (MFM) that progressively fuses identity features with the warped makeup. The framework supports partial makeup transfer and shade-controllable transfer, and can perform makeup removal by reversing the transfer with a non makeup reference. The authors propose loss functions that combine perceptual, makeup, cycle, adversarial, and identity constraints, and use pseudo ground truth generated from OT matching for supervision. Experiments on the Makeup Transfer dataset and M-wild dataset show that SARA achieves state-of-the-art results in terms of transfer fidelity, sharpness, and controllability, while preserving identity across challenging misalignments and occlusions.

Abstract

Makeup transfer is a process of transferring the makeup style from a reference image to the source images, while preserving the source images' identities. This technique is highly desirable and finds many applications. However, existing methods lack fine-level control of the makeup style, making it challenging to achieve high-quality results when dealing with large spatial misalignments. To address this problem, we propose a novel Spatial Alignment and Region-Adaptive normalization method (SARA) in this paper. Our method generates detailed makeup transfer results that can handle large spatial misalignments and achieve part-specific and shade-controllable makeup transfer. Specifically, SARA comprises three modules: Firstly, a spatial alignment module that preserves the spatial context of makeup and provides a target semantic map for guiding the shape-independent style codes. Secondly, a region-adaptive normalization module that decouples shape and makeup style using per-region encoding and normalization, which facilitates the elimination of spatial misalignments. Lastly, a makeup fusion module blends identity features and makeup style by injecting learned scale and bias parameters. Experimental results show that our SARA method outperforms existing methods and achieves state-of-the-art performance on two public datasets.

SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization

TL;DR

Abstract

Paper Structure (21 sections, 16 equations, 11 figures, 2 tables)

This paper contains 21 sections, 16 equations, 11 figures, 2 tables.

Introduction
Related Work
Makeup Transfer
Style Transfer
Methodology
Problem Formulation and Notations
Network Structure
Semantic-guided Alignment Module
Region-Adaptive normalization Module
Makeup Fusion Module
Loss Functions
Experiments
Implementation Setting and datasets
Qualitative Results
Quantitative Results
...and 6 more sections

Figures (11)

Figure 1: SARA supports flexible operations. (a) SARA enables pose-robust transfer under the guidance of semantic alignment. (b) Users are allowed to select partial makeup styles from the reference image. (c) SARA can adjust the degree of the makeup styles. For best results, zoom in.
Figure 2: Overview of our proposed method. It mainly has three modules: (i) Semantic-guided alignment module warps the reference image $y_r$ and the partial reference semantic map $x_{rk}^l$ to be aligned with the source semantic map $x_s^l$ through estimating the dense correspondence between $y_r$ and $x_s^l$, where $i=\{lip,skin,eyes\}$. (ii) Region-adaptive normalization module decouples the shape and style of makeup via a region-wise average pooling layer, broadcasting the shape-independent style codes to the target semantic map $W_{x_{rk}^l}$ to generate the style matrix $ST$. The modulated parameters are dynamically combined with $W_{y_r}^{out}$ and $ST$.(iii) Makeup fusion module progressively fuses makeup styles with the identity features, to generate the fine-grained result $\hat{y}_s$.
Figure 3: The comparison of warped results with different matching. The third column depicts cosine matching results, while the fourth shows those from OT matching. The warped results generated from OT matching preserve more intricate makeup features, such as blusher.
Figure 4: Dynamic combination of region-adaptive normalization. The scale and bias parameters $\alpha$ and $\beta$ are weighted from the warped out $W^{out}_{y_r}$ and the style matrix $ST$.
Figure 5: Comparison of generated pseudo ground truth between EleGANt yang2022elegant and ours.
...and 6 more figures

SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization

TL;DR

Abstract

SARA: Controllable Makeup Transfer with Spatial Alignment and Region-Adaptive Normalization

Authors

TL;DR

Abstract

Table of Contents

Figures (11)