Table of Contents
Fetching ...

Realtime Data-Efficient Portrait Stylization Based On Geometric Alignment

Xinrui Wang, Zhuoru Li, Xiao Zhou, Yusuke Iwasawa, Yutaka Matsuo

TL;DR

Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that the integrated Thin-Plate-Spline modules integrated into an end-to-end Generative Adversarial Network framework not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2× training data efficiency and 100× less computational complexity.

Abstract

Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2x training data efficiency and 100x less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.

Realtime Data-Efficient Portrait Stylization Based On Geometric Alignment

TL;DR

Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that the integrated Thin-Plate-Spline modules integrated into an end-to-end Generative Adversarial Network framework not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2× training data efficiency and 100× less computational complexity.

Abstract

Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2x training data efficiency and 100x less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.
Paper Structure (18 sections, 10 equations, 13 figures, 3 tables)

This paper contains 18 sections, 10 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 2: Effect of geometric alignment in stylization. We use Mean Landmark Distance (MLD) to measure geometric similarity between portrait-style image pairs and Frechet Inception Distance(FID)/Art-FID to evaluate stylization quality. The stylization quality increases when portraits and style images become more geometrically similar. MLD, FID and Art-FID are described in Section \ref{['section:experiment-setup']}. Our experiments also show that geometric alignment enables smaller models with it to achieve comparable results with larger models without it. VGG11 (9M w/o head) and VGG19 (20M w/o head) are used as small and large model for exemplar-based NST and CycleGAN (10.85M) and its lightweight variant (1.73M) are used as small and large model for GAN-based style transfer.
  • Figure 3: Illustration of identity distortion in portrait stylization tasks. All methods are trained on limited unpaired datasets.
  • Figure 4: Overview of our framework. The proposed cycle-consistency framework involves two transformation directions: portrait to style and vice versa. For the portrait to style transformation, it comprises two branches: (1) the geometric warping branch, where the generator $G_{p2s}$ warps features from the portrait image $I_p$ using facial landmarks of the style image, synthesizing the aligned stylized result $I^{warp}_{p2s}$; and (2) the geometric invariant branch, where $G_{p2s}$ directly synthesizes the unaligned result $I_{p2s}$. The style image $I_s$ is warped with landmarks of the portrait to obtain $I^{warp}_s$. To constrain stylization, ($I^{warp}_{p2s}$, $I_s$) and ($I_{p2s}$, $I^{warp}_s$) are fed into the discriminator $D_s$, and $I_{p2s}$ is fed into the local stylization discriminator module. The cycle-consistency loss with frozen LPIPS is adopted to constrain content.
  • Figure 5: Illustration of the local stylization D module. It utilizes a face bank containing cropped facial characteristics (eyes, nose, and mouth) from style images. During training, style patches are randomly sampled, geometrically aligned with the corresponding facial characteristics of portrait images, and fed into 4 auxiliary discriminators along with portrait patches. These discriminators enhance the stylization quality of facial characteristics.
  • Figure 6: Overview of our dataset. We show examples from 4 different styles, from top to bottom: Animation (N=800), Watercolor (N=64), Oilpaint (N=300) and Inkpaint (N=34). In the right part, we illustrate the landmarks used for TPS warping and the facial characteristics regions cropped out for local stylization.
  • ...and 8 more figures