Learning Spatially Decoupled Color Representations for Facial Image Colorization

Hangyan Zhu; Ming Liu; Chao Zhou; Zifei Yan; Kuanquan Wang; Wangmeng Zuo

Learning Spatially Decoupled Color Representations for Facial Image Colorization

Hangyan Zhu, Ming Liu, Chao Zhou, Zifei Yan, Kuanquan Wang, Wangmeng Zuo

TL;DR

This work presents FCNet, a facial colorization framework that decouples color across facial components using face-parsing priors, enabling precise component-wise control. A color representation branch extracts per-component color codes from references, while a colorization network fuses these codes with grayscale input; a chromatic-spatial augmentation strategy enforces component-specific color mappings. The approach supports single/multi-reference colorization and extends to no-reference scenarios via automatic prediction and diverse sampling with normalizing flows. Experiments on FFHQ and CelebA-HQ show state-of-the-art or competitive performance in FID, PSNR, SSIM, and qualitative assessments, with ablations confirming the value of the decoupled representation, augmentation, and grouped design. Overall, FCNet delivers controllable, vibrant, and realistic facial colorization with flexible application modes and public code release.

Abstract

Image colorization methods have shown prominent performance on natural images. However, since humans are more sensitive to faces, existing methods are insufficient to meet the demands when applied to facial images, typically showing unnatural and uneven colorization results. In this paper, we investigate the facial image colorization task and find that the problems with facial images can be attributed to an insufficient understanding of facial components. As a remedy, by introducing facial component priors, we present a novel facial image colorization framework dubbed FCNet. Specifically, we learn a decoupled color representation for each face component (e.g., lips, skin, eyes, and hair) under the guidance of face parsing maps. A chromatic and spatial augmentation strategy is presented to facilitate the learning procedure, which requires only grayscale and color facial image pairs. After training, the presented FCNet can be naturally applied to facial image colorization with single or multiple reference images. To expand the application paradigms to scenarios with no reference images, we further train two alternative modules, which predict the color representations from the grayscale input or a random seed, respectively. Extensive experiments show that our method can perform favorably against existing methods in various application scenarios (i.e., no-, single-, and multi-reference facial image colorization). The source code and pre-trained models will be publicly available.

Learning Spatially Decoupled Color Representations for Facial Image Colorization

TL;DR

Abstract

Learning Spatially Decoupled Color Representations for Facial Image Colorization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)