Table of Contents
Fetching ...

Learning Spatially Decoupled Color Representations for Facial Image Colorization

Hangyan Zhu, Ming Liu, Chao Zhou, Zifei Yan, Kuanquan Wang, Wangmeng Zuo

TL;DR

This work presents FCNet, a facial colorization framework that decouples color across facial components using face-parsing priors, enabling precise component-wise control. A color representation branch extracts per-component color codes from references, while a colorization network fuses these codes with grayscale input; a chromatic-spatial augmentation strategy enforces component-specific color mappings. The approach supports single/multi-reference colorization and extends to no-reference scenarios via automatic prediction and diverse sampling with normalizing flows. Experiments on FFHQ and CelebA-HQ show state-of-the-art or competitive performance in FID, PSNR, SSIM, and qualitative assessments, with ablations confirming the value of the decoupled representation, augmentation, and grouped design. Overall, FCNet delivers controllable, vibrant, and realistic facial colorization with flexible application modes and public code release.

Abstract

Image colorization methods have shown prominent performance on natural images. However, since humans are more sensitive to faces, existing methods are insufficient to meet the demands when applied to facial images, typically showing unnatural and uneven colorization results. In this paper, we investigate the facial image colorization task and find that the problems with facial images can be attributed to an insufficient understanding of facial components. As a remedy, by introducing facial component priors, we present a novel facial image colorization framework dubbed FCNet. Specifically, we learn a decoupled color representation for each face component (e.g., lips, skin, eyes, and hair) under the guidance of face parsing maps. A chromatic and spatial augmentation strategy is presented to facilitate the learning procedure, which requires only grayscale and color facial image pairs. After training, the presented FCNet can be naturally applied to facial image colorization with single or multiple reference images. To expand the application paradigms to scenarios with no reference images, we further train two alternative modules, which predict the color representations from the grayscale input or a random seed, respectively. Extensive experiments show that our method can perform favorably against existing methods in various application scenarios (i.e., no-, single-, and multi-reference facial image colorization). The source code and pre-trained models will be publicly available.

Learning Spatially Decoupled Color Representations for Facial Image Colorization

TL;DR

This work presents FCNet, a facial colorization framework that decouples color across facial components using face-parsing priors, enabling precise component-wise control. A color representation branch extracts per-component color codes from references, while a colorization network fuses these codes with grayscale input; a chromatic-spatial augmentation strategy enforces component-specific color mappings. The approach supports single/multi-reference colorization and extends to no-reference scenarios via automatic prediction and diverse sampling with normalizing flows. Experiments on FFHQ and CelebA-HQ show state-of-the-art or competitive performance in FID, PSNR, SSIM, and qualitative assessments, with ablations confirming the value of the decoupled representation, augmentation, and grouped design. Overall, FCNet delivers controllable, vibrant, and realistic facial colorization with flexible application modes and public code release.

Abstract

Image colorization methods have shown prominent performance on natural images. However, since humans are more sensitive to faces, existing methods are insufficient to meet the demands when applied to facial images, typically showing unnatural and uneven colorization results. In this paper, we investigate the facial image colorization task and find that the problems with facial images can be attributed to an insufficient understanding of facial components. As a remedy, by introducing facial component priors, we present a novel facial image colorization framework dubbed FCNet. Specifically, we learn a decoupled color representation for each face component (e.g., lips, skin, eyes, and hair) under the guidance of face parsing maps. A chromatic and spatial augmentation strategy is presented to facilitate the learning procedure, which requires only grayscale and color facial image pairs. After training, the presented FCNet can be naturally applied to facial image colorization with single or multiple reference images. To expand the application paradigms to scenarios with no reference images, we further train two alternative modules, which predict the color representations from the grayscale input or a random seed, respectively. Extensive experiments show that our method can perform favorably against existing methods in various application scenarios (i.e., no-, single-, and multi-reference facial image colorization). The source code and pre-trained models will be publicly available.

Paper Structure

This paper contains 28 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Our method encompasses three colorization approaches, i.e., single- or multi-reference image-guided colorization in (a), sampling-guided colorization in (b), and automatic colorization in (c). In (a), the first row shows five reference images from different identities, while the second row provides the grayscale input, five colorized images referring to the five reference images, and a result whose colorization for different facial components relies on different reference images. In (b), the results are also generated according to the sampled single or multiple color representations. In (c), we give our results under automatic settings and the results of competing methods.
  • Figure 2: Overview and main training phase of our method. As depicted in the figure, the red arrows represent the data augmentation process. The orange dashed enclosure highlights the colorization network $f$. $\mathcal{D}$ is the discriminator.
  • Figure 3: Two application paradigms for No-reference scenarios. (a) denotes the Diverse Colorization, wherein the $g_\mathit{flow}$ can randomly generate a color representation for a given grayscale image $\bm{\mathit{x}}^{l}$ by sampling from a known probability distribution. (b) denotes the Automatic Colorization, wherein the $g_\mathit{auto}$ encodes $\bm{\mathit{x}}^{l}$ and its associated face parsing map $\bm{\mathit{m}}^l$ to generate the color representation $\bm{\mathit{w}}$.
  • Figure 4: The qualitative results of our method and automatic colorization baseline methods.
  • Figure 5: The qualitative results of our method and reference image-based colorization baseline methods.