Automatic Controllable Colorization via Imagination
Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei
TL;DR
This work tackles the multimodal challenge of automatic colorization by introducing an imagination-based framework that uses pretrained diffusion priors to synthesize multiple semantically aligned reference images for a grayscale input. A Reference Refinement Module constructs an optimal, instance-aware reference from these candidates, enabling controllable, editable colorization via a UniColor-inspired colorization module that generates and propagates hint colors. The approach yields photorealistic, vivid results with higher editability and diversity than prior methods, demonstrated through quantitative metrics and user studies on COCO-stuff, ImageNet, and in-the-wild images. By explicitly modeling and composing coloring samples, the framework supports iterative and localized edits, suggesting broader potential for applying imaginative priors to other computer vision tasks.
Abstract
We propose a framework for automatic colorization that allows for iterative editing and modifications. The core of our framework lies in an imagination module: by understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content. These images serve as references for coloring, mimicking the process of human experts. As the synthesized images can be imperfect or different from the original grayscale image, we propose a Reference Refinement Module to select the optimal reference composition. Unlike most previous end-to-end automatic colorization algorithms, our framework allows for iterative and localized modifications of the colorization results because we explicitly model the coloring samples. Extensive experiments demonstrate the superiority of our framework over existing automatic colorization algorithms in editability and flexibility. Project page: https://xy-cong.github.io/imagine-colorization.
