Table of Contents
Fetching ...

SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models

Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen, Fei Du, Weihua Chen, Fan Wang, Yi Rong

TL;DR

A novel Self-supervised Hierarchical Makeup Transfer (SHMT) method via latent diffusion models that works in a self-supervised manner, freeing itself from the misguidance of imprecise pseudo-paired data.

Abstract

This paper studies the challenging task of makeup transfer, which aims to apply diverse makeup styles precisely and naturally to a given facial image. Due to the absence of paired data, current methods typically synthesize sub-optimal pseudo ground truths to guide the model training, resulting in low makeup fidelity. Additionally, different makeup styles generally have varying effects on the person face, but existing methods struggle to deal with this diversity. To address these issues, we propose a novel Self-supervised Hierarchical Makeup Transfer (SHMT) method via latent diffusion models. Following a "decoupling-and-reconstruction" paradigm, SHMT works in a self-supervised manner, freeing itself from the misguidance of imprecise pseudo-paired data. Furthermore, to accommodate a variety of makeup styles, hierarchical texture details are decomposed via a Laplacian pyramid and selectively introduced to the content representation. Finally, we design a novel Iterative Dual Alignment (IDA) module that dynamically adjusts the injection condition of the diffusion model, allowing the alignment errors caused by the domain gap between content and makeup representations to be corrected. Extensive quantitative and qualitative analyses demonstrate the effectiveness of our method. Our code is available at \url{https://github.com/Snowfallingplum/SHMT}.

SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models

TL;DR

A novel Self-supervised Hierarchical Makeup Transfer (SHMT) method via latent diffusion models that works in a self-supervised manner, freeing itself from the misguidance of imprecise pseudo-paired data.

Abstract

This paper studies the challenging task of makeup transfer, which aims to apply diverse makeup styles precisely and naturally to a given facial image. Due to the absence of paired data, current methods typically synthesize sub-optimal pseudo ground truths to guide the model training, resulting in low makeup fidelity. Additionally, different makeup styles generally have varying effects on the person face, but existing methods struggle to deal with this diversity. To address these issues, we propose a novel Self-supervised Hierarchical Makeup Transfer (SHMT) method via latent diffusion models. Following a "decoupling-and-reconstruction" paradigm, SHMT works in a self-supervised manner, freeing itself from the misguidance of imprecise pseudo-paired data. Furthermore, to accommodate a variety of makeup styles, hierarchical texture details are decomposed via a Laplacian pyramid and selectively introduced to the content representation. Finally, we design a novel Iterative Dual Alignment (IDA) module that dynamically adjusts the injection condition of the diffusion model, allowing the alignment errors caused by the domain gap between content and makeup representations to be corrected. Extensive quantitative and qualitative analyses demonstrate the effectiveness of our method. Our code is available at \url{https://github.com/Snowfallingplum/SHMT}.

Paper Structure

This paper contains 29 sections, 7 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Illustration of two main difficulties in the makeup transfer task. (a) Due to the absence of paired data, previous methods utilize histogram matching or geometric distortion to synthesize sub-optimal pseudo-paired data, which inevitably misguide the model training. (b) Some source content details should be preserved in simple makeup styles but be removed in complex ones.
  • Figure 2: In addition to color matching, our approach allows flexible control to preserve or discard texture details for various makeup styles, without changing the facial shape.
  • Figure 3: The framework of SHMT. A facial image $I$ is decomposed into background area $I_{bg}$, makeup representation $I_{m}$, and content representation ($I_{3d}$, $h_{i}$). The makeup transfer procedure is simulated by reconstructing the original image from these components. Hierarchica texture details $h_{i}$ are constructed to respond to different makeup styles. In each denoising step $t$, IDA draws on the noisy intermediate result $\hat{I}_{t}$ to dynamically adjust the injection condition to correct alignment errors.
  • Figure 4: Qualitative comparison with GAN-based baselines on simple makeup styles.
  • Figure 5: Qualitative comparison with GAN-based baselines on complex makeup styles.
  • ...and 12 more figures