Table of Contents
Fetching ...

Colorful Image Colorization

Richard Zhang, Phillip Isola, Alexei A. Efros

TL;DR

This paper tackles automatic colorization of grayscale images, a highly underconstrained task. It proposes a CNN that treats color as a multimodal distribution over quantized ab values and uses class rebalancing and an annealed-mean to produce vibrant results. It demonstrates the learned representations as a strong self-supervised signal, yielding improvements in downstream classification and segmentation, and shows the approach generalizes to legacy photos. The method outperforms prior colorization approaches on perceptual realism and self-supervised benchmarks, establishing colorization as both a graphics tool and a viable pretext task for representation learning.

Abstract

Given a grayscale photograph as input, this paper attacks the problem of hallucinating a plausible color version of the photograph. This problem is clearly underconstrained, so previous approaches have either relied on significant user interaction or resulted in desaturated colorizations. We propose a fully automatic approach that produces vibrant and realistic colorizations. We embrace the underlying uncertainty of the problem by posing it as a classification task and use class-rebalancing at training time to increase the diversity of colors in the result. The system is implemented as a feed-forward pass in a CNN at test time and is trained on over a million color images. We evaluate our algorithm using a "colorization Turing test," asking human participants to choose between a generated and ground truth color image. Our method successfully fools humans on 32% of the trials, significantly higher than previous methods. Moreover, we show that colorization can be a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder. This approach results in state-of-the-art performance on several feature learning benchmarks.

Colorful Image Colorization

TL;DR

This paper tackles automatic colorization of grayscale images, a highly underconstrained task. It proposes a CNN that treats color as a multimodal distribution over quantized ab values and uses class rebalancing and an annealed-mean to produce vibrant results. It demonstrates the learned representations as a strong self-supervised signal, yielding improvements in downstream classification and segmentation, and shows the approach generalizes to legacy photos. The method outperforms prior colorization approaches on perceptual realism and self-supervised benchmarks, establishing colorization as both a graphics tool and a viable pretext task for representation learning.

Abstract

Given a grayscale photograph as input, this paper attacks the problem of hallucinating a plausible color version of the photograph. This problem is clearly underconstrained, so previous approaches have either relied on significant user interaction or resulted in desaturated colorizations. We propose a fully automatic approach that produces vibrant and realistic colorizations. We embrace the underlying uncertainty of the problem by posing it as a classification task and use class-rebalancing at training time to increase the diversity of colors in the result. The system is implemented as a feed-forward pass in a CNN at test time and is trained on over a million color images. We evaluate our algorithm using a "colorization Turing test," asking human participants to choose between a generated and ground truth color image. Our method successfully fools humans on 32% of the trials, significantly higher than previous methods. Moreover, we show that colorization can be a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder. This approach results in state-of-the-art performance on several feature learning benchmarks.

Paper Structure

This paper contains 21 sections, 4 equations, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Example input grayscale photos and output colorizations from our algorithm. These examples are cases where our model works especially well. Please visit http://richzhang.github.io/colorization/ to see the full range of results and to try our model and code. Best viewed in color (obviously).
  • Figure 2: Our network architecture. Each conv layer refers to a block of 2 or 3 repeated conv and ReLU layers, followed by a BatchNormioffe2015batch layer. The net has no pool layers. All changes in resolution are achieved through spatial downsampling or upsampling between conv blocks.
  • Figure 3: (a) Quantized ab color space with a grid size of 10. A total of 313 ab pairs are in gamut. (b) Empirical probability distribution of $ab$ values, shown in log scale. (c) Empirical probability distribution of $ab$ values, conditioned on L, shown in log scale.
  • Figure 4: The effect of temperature parameter $T$ on the annealed-mean output (Equation \ref{['eqn:ann-mean']}). The left-most images show the means of the predicted color distributions and the right-most show the modes. We use $T=0.38$ in our system.
  • Figure 5: Example results from our ImageNet test set. Our classification loss with rebalancing produces more accurate and vibrant results than a regression loss or a classification loss without rebalancing. Successful colorizations are above the dotted line. Common failures are below. These include failure to capture long-range consistency, frequent confusions between red and blue, and a default sepia tone on complex indoor scenes. Please visit http://richzhang.github.io/colorization/ to see the full range of results.
  • ...and 14 more figures