Table of Contents
Fetching ...

Cobra: Efficient Line Art COlorization with BRoAder References

Junhao Zhuang, Lingen Li, Xuan Ju, Zhaoyang Zhang, Chun Yuan, Ying Shan

TL;DR

Cobra tackles the challenge of colorizing line art with extensive reference guidance by designing a long-context diffusion framework that scales to over 200 reference images while maintaining low latency. It introduces a Causal Sparse DiT with KV-Cache, and Localized Reusable Position Encoding to efficiently fuse many references without altering pre-trained 2D encodings. A Line Art Guider, along with a Self-Attention-Only block, line-art style augmentation, and a hint-point sampling strategy, enables precise color ID preservation and flexible color hints. Empirical results on Cobra-Bench show superior image quality, color ID accuracy, and speed compared to ColorFlow and other baselines, with a clear industrial impact for multi-reference comic colorization.

Abstract

The comic production industry requires reference-based line art colorization with high accuracy, efficiency, contextual consistency, and flexible control. A comic page often involves diverse characters, objects, and backgrounds, which complicates the coloring process. Despite advancements in diffusion models for image generation, their application in line art colorization remains limited, facing challenges related to handling extensive reference images, time-consuming inference, and flexible control. We investigate the necessity of extensive contextual image guidance on the quality of line art colorization. To address these challenges, we introduce Cobra, an efficient and versatile method that supports color hints and utilizes over 200 reference images while maintaining low latency. Central to Cobra is a Causal Sparse DiT architecture, which leverages specially designed positional encodings, causal sparse attention, and Key-Value Cache to effectively manage long-context references and ensure color identity consistency. Results demonstrate that Cobra achieves accurate line art colorization through extensive contextual reference, significantly enhancing inference speed and interactivity, thereby meeting critical industrial demands. We release our codes and models on our project page: https://zhuang2002.github.io/Cobra/.

Cobra: Efficient Line Art COlorization with BRoAder References

TL;DR

Cobra tackles the challenge of colorizing line art with extensive reference guidance by designing a long-context diffusion framework that scales to over 200 reference images while maintaining low latency. It introduces a Causal Sparse DiT with KV-Cache, and Localized Reusable Position Encoding to efficiently fuse many references without altering pre-trained 2D encodings. A Line Art Guider, along with a Self-Attention-Only block, line-art style augmentation, and a hint-point sampling strategy, enables precise color ID preservation and flexible color hints. Empirical results on Cobra-Bench show superior image quality, color ID accuracy, and speed compared to ColorFlow and other baselines, with a clear industrial impact for multi-reference comic colorization.

Abstract

The comic production industry requires reference-based line art colorization with high accuracy, efficiency, contextual consistency, and flexible control. A comic page often involves diverse characters, objects, and backgrounds, which complicates the coloring process. Despite advancements in diffusion models for image generation, their application in line art colorization remains limited, facing challenges related to handling extensive reference images, time-consuming inference, and flexible control. We investigate the necessity of extensive contextual image guidance on the quality of line art colorization. To address these challenges, we introduce Cobra, an efficient and versatile method that supports color hints and utilizes over 200 reference images while maintaining low latency. Central to Cobra is a Causal Sparse DiT architecture, which leverages specially designed positional encodings, causal sparse attention, and Key-Value Cache to effectively manage long-context references and ensure color identity consistency. Results demonstrate that Cobra achieves accurate line art colorization through extensive contextual reference, significantly enhancing inference speed and interactivity, thereby meeting critical industrial demands. We release our codes and models on our project page: https://zhuang2002.github.io/Cobra/.

Paper Structure

This paper contains 27 sections, 4 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: The overview of Cobra. This figure depicts the framework of Cobra, which utilizes a large collection of retrieved reference images to guide the colorization of comic line art. The framework effectively manages an arbitrary number of contextual image references through localized reusable positional encoding, ensuring appropriate aspect ratios and resolutions. Additionally, the causal sparse DiT architecture processes long contextual references, enhancing identity preservation and color accuracy while reducing computational complexity. The integration of optional color hints further ensures user flexibility, culminating in high-quality coloring that is highly suitable for industrial applications.
  • Figure 2: Illustration of the transition from Full Attention to Causal Sparse Attention. This figure highlights the reduction in computational complexity achieved by excluding pairwise calculations among reference images. Additionally, the application of unidirectional causal attention, along with the use of KV-Cache, further enhances computational efficiency while ensuring effective transmission of essential color ID information.
  • Figure 3: An example of line art style augmentation, demonstrating the blending of outputs from two distinct line art extractors. This strategy improves the robustness of the Line Art Guider to diverse artistic styles.
  • Figure 4: Hint Point Sampling Strategy. This method reduces ambiguity by limiting the RGB pixel value variance within hint points to 0.01, effectively preventing hint points from being placed at edge intersections during training. Additionally, we visualize 30,000 randomly sampled hint points to demonstrate their distribution.
  • Figure 5: Qualitative results of line art colorization, highlighting how Cobra outperforms other methods by accurately preserving color IDs and providing high-quality results, even in complex scenarios.
  • ...and 7 more figures