Table of Contents
Fetching ...

Automatic Controllable Colorization via Imagination

Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei

TL;DR

This work tackles the multimodal challenge of automatic colorization by introducing an imagination-based framework that uses pretrained diffusion priors to synthesize multiple semantically aligned reference images for a grayscale input. A Reference Refinement Module constructs an optimal, instance-aware reference from these candidates, enabling controllable, editable colorization via a UniColor-inspired colorization module that generates and propagates hint colors. The approach yields photorealistic, vivid results with higher editability and diversity than prior methods, demonstrated through quantitative metrics and user studies on COCO-stuff, ImageNet, and in-the-wild images. By explicitly modeling and composing coloring samples, the framework supports iterative and localized edits, suggesting broader potential for applying imaginative priors to other computer vision tasks.

Abstract

We propose a framework for automatic colorization that allows for iterative editing and modifications. The core of our framework lies in an imagination module: by understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content. These images serve as references for coloring, mimicking the process of human experts. As the synthesized images can be imperfect or different from the original grayscale image, we propose a Reference Refinement Module to select the optimal reference composition. Unlike most previous end-to-end automatic colorization algorithms, our framework allows for iterative and localized modifications of the colorization results because we explicitly model the coloring samples. Extensive experiments demonstrate the superiority of our framework over existing automatic colorization algorithms in editability and flexibility. Project page: https://xy-cong.github.io/imagine-colorization.

Automatic Controllable Colorization via Imagination

TL;DR

This work tackles the multimodal challenge of automatic colorization by introducing an imagination-based framework that uses pretrained diffusion priors to synthesize multiple semantically aligned reference images for a grayscale input. A Reference Refinement Module constructs an optimal, instance-aware reference from these candidates, enabling controllable, editable colorization via a UniColor-inspired colorization module that generates and propagates hint colors. The approach yields photorealistic, vivid results with higher editability and diversity than prior methods, demonstrated through quantitative metrics and user studies on COCO-stuff, ImageNet, and in-the-wild images. By explicitly modeling and composing coloring samples, the framework supports iterative and localized edits, suggesting broader potential for applying imaginative priors to other computer vision tasks.

Abstract

We propose a framework for automatic colorization that allows for iterative editing and modifications. The core of our framework lies in an imagination module: by understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content. These images serve as references for coloring, mimicking the process of human experts. As the synthesized images can be imperfect or different from the original grayscale image, we propose a Reference Refinement Module to select the optimal reference composition. Unlike most previous end-to-end automatic colorization algorithms, our framework allows for iterative and localized modifications of the colorization results because we explicitly model the coloring samples. Extensive experiments demonstrate the superiority of our framework over existing automatic colorization algorithms in editability and flexibility. Project page: https://xy-cong.github.io/imagine-colorization.
Paper Structure (21 sections, 5 equations, 12 figures, 4 tables)

This paper contains 21 sections, 5 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Our colorization results. By leveraging our designed Imagination Module, our framework can achieve photorealistic and vivid colorization results.
  • Figure 2: Framework Overview. Given a black-and-white input, our framework first synthesizes a semantically similar, spatially aligned, and instance-aware reference by mimicking the imagination process of human experts. Then the colorization module colorizes the black-and-white image with the guidance of reference.
  • Figure 3: Imagination Module and Reference Refinement Module. In Imagination Module \ref{['subsec:Imagination-Module']}, given a grayscale input, we generate $N$ reference candidates $\mathbf{C}$, $N_1$ of which are conditioned on the canny edge of $\mathbf{X}$, and $N_2$ of which are conditioned on the HED boundary of $\mathbf{X}$, $N = N_1 + N_2$. In Reference Refinement Module \ref{['subsec:Reference-Refinement-Module']}, we first extract the segmentation $\mathbf{S}$ of $\mathbf{X}$. For each segment $\mathbf{S}^j$, we look for the best reference segment $\mathbf{R}\, \odot\, \mathbf{S}^j$ for the optimal reference $\mathbf{R}$ by selecting the nearest neighbor for $\mathbf{X} \, \odot \, \mathbf{S}^j$ among all reference candidates $\mathbf{C}_i\, \odot\, \mathbf{S}^j$ in terms of the difference in the robust and universe DINOv2 oquab2023dinov2 feature space.
  • Figure 4: ControlNet Issues. Directly applying ControlNetzhang2023adding to colorization might cause issues like inconsistency, grayish, and color bleeding. We use gray images as conditions of ControlNet and corresponding color images as ground truth. A pretrained stabel diffusion model is used as the initialization. We then train the model on the commonly used colorization datasets.
  • Figure 5: Diverse Colorization. We can synthesize diverse colorful references from Imagination Module \ref{['subsec:Imagination-Module']}, yielding diverse colorzation results.
  • ...and 7 more figures