Table of Contents
Fetching ...

ColorFlow: Retrieval-Augmented Image Sequence Colorization

Junhao Zhuang, Xuan Ju, Zhaoyang Zhang, Yong Liu, Shiyi Zhang, Chun Yuan, Ying Shan

TL;DR

ColorFlow tackles the challenge of automatic colorization for black-and-white image sequences while preserving fine-grained color IDs across frames. It introduces a three-stage retrieval-augmented pipeline, an in-context diffusion colorization module with a Colorization Guider, and a guided super-resolution stage to maintain identity consistency and high fidelity. Through ColorFlow-Bench, the method achieves state-of-the-art results across perceptual and pixel-based metrics and garners strong user-study support, demonstrating industrial viability for manga and animation production. The work highlights the importance of robust multi-reference retrieval, context-aware color transfer, and artifact-reducing upscaling for scalable, high-quality sequential colorization, while acknowledging ethical considerations around data bias and content authenticity.

Abstract

Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization. Despite advancements in visual colorization using large-scale generative models like diffusion models, challenges with controllability and identity consistency persist, making current solutions unsuitable for industrial application.To address this, we propose ColorFlow, a three-stage diffusion-based framework tailored for image sequence colorization in industrial applications. Unlike existing methods that require per-ID finetuning or explicit ID embedding extraction, we propose a novel robust and generalizable Retrieval Augmented Colorization pipeline for colorizing images with relevant color references. Our pipeline also features a dual-branch design: one branch for color identity extraction and the other for colorization, leveraging the strengths of diffusion models. We utilize the self-attention mechanism in diffusion models for strong in-context learning and color identity matching. To evaluate our model, we introduce ColorFlow-Bench, a comprehensive benchmark for reference-based colorization. Results show that ColorFlow outperforms existing models across multiple metrics, setting a new standard in sequential image colorization and potentially benefiting the art industry. We release our codes and models on our project page: https://zhuang2002.github.io/ColorFlow/.

ColorFlow: Retrieval-Augmented Image Sequence Colorization

TL;DR

ColorFlow tackles the challenge of automatic colorization for black-and-white image sequences while preserving fine-grained color IDs across frames. It introduces a three-stage retrieval-augmented pipeline, an in-context diffusion colorization module with a Colorization Guider, and a guided super-resolution stage to maintain identity consistency and high fidelity. Through ColorFlow-Bench, the method achieves state-of-the-art results across perceptual and pixel-based metrics and garners strong user-study support, demonstrating industrial viability for manga and animation production. The work highlights the importance of robust multi-reference retrieval, context-aware color transfer, and artifact-reducing upscaling for scalable, high-quality sequential colorization, while acknowledging ethical considerations around data bias and content authenticity.

Abstract

Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization. Despite advancements in visual colorization using large-scale generative models like diffusion models, challenges with controllability and identity consistency persist, making current solutions unsuitable for industrial application.To address this, we propose ColorFlow, a three-stage diffusion-based framework tailored for image sequence colorization in industrial applications. Unlike existing methods that require per-ID finetuning or explicit ID embedding extraction, we propose a novel robust and generalizable Retrieval Augmented Colorization pipeline for colorizing images with relevant color references. Our pipeline also features a dual-branch design: one branch for color identity extraction and the other for colorization, leveraging the strengths of diffusion models. We utilize the self-attention mechanism in diffusion models for strong in-context learning and color identity matching. To evaluate our model, we introduce ColorFlow-Bench, a comprehensive benchmark for reference-based colorization. Results show that ColorFlow outperforms existing models across multiple metrics, setting a new standard in sequential image colorization and potentially benefiting the art industry. We release our codes and models on our project page: https://zhuang2002.github.io/ColorFlow/.

Paper Structure

This paper contains 28 sections, 6 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: ColorFlow is the first model designed for fine-grained ID preservation in image sequence colorization, utilizing contextual information. Given a reference image pool, ColorFlow accurately generates colors for various elements in black and white image sequences, including the hair color and attire of characters, ensuring color consistency with the reference images. [Best viewed in color with zoom-in].
  • Figure 2: The overview of ColorFlow. This figure presents the three primary components of our framework: the Retrieval-Augmented Pipeline (RAP), the In-context Colorization Pipeline (ICP), and the Guided Super-Resolution Pipeline (GSRP). Each component is essential for maintaining the color identity of instances across black-and-white image sequences while ensuring high-quality colorization.
  • Figure 3: Patch-Wise training strategy is designed to reduce the computational demands of training on high-resolution stitched images. The left box displays segmented stitched images from the training phase, with the corresponding masks also segmented accordingly. The right box presents the complete stitched image and masks for the inference phase.
  • Figure 4: Screenstyle augmentation. From left to right: the colored manga, the grayscale manga, linear interpolations between the grayscale manga and the ScreenVAE xie2020manga output with proportions of 0.66 and 0.33, the ScreenVAE output.
  • Figure 5: Visualization of the heatmap for the self-attention map of the selected colorization region (encircled in red).
  • ...and 8 more figures