Realtime Data-Efficient Portrait Stylization Based On Geometric Alignment
Xinrui Wang, Zhuoru Li, Xiao Zhou, Yusuke Iwasawa, Yutaka Matsuo
TL;DR
Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that the integrated Thin-Plate-Spline modules integrated into an end-to-end Generative Adversarial Network framework not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2× training data efficiency and 100× less computational complexity.
Abstract
Portrait Stylization aims to imbue portrait photos with vivid artistic effects drawn from style examples. Despite the availability of enormous training datasets and large network weights, existing methods struggle to maintain geometric consistency and achieve satisfactory stylization effects due to the disparity in facial feature distributions between facial photographs and stylized images, limiting the application on rare styles and mobile devices. To alleviate this, we propose to establish meaningful geometric correlations between portraits and style samples to simplify the stylization by aligning corresponding facial characteristics. Specifically, we integrate differentiable Thin-Plate-Spline (TPS) modules into an end-to-end Generative Adversarial Network (GAN) framework to improve the training efficiency and promote the consistency of facial identities. By leveraging inherent structural information of faces, e.g., facial landmarks, TPS module can establish geometric alignments between the two domains, at global and local scales, both in pixel and feature spaces, thereby overcoming the aforementioned challenges. Quantitative and qualitative comparisons on a range of portrait stylization tasks demonstrate that our models not only outperforms existing models in terms of fidelity and stylistic consistency, but also achieves remarkable improvements in 2x training data efficiency and 100x less computational complexity, allowing our lightweight model to achieve real-time inference (30 FPS) at 512*512 resolution on mobile devices.
