Table of Contents
Fetching ...

MangaNinja: Line Art Colorization with Precise Reference Following

Zhiheng Liu, Ka Leong Cheng, Xi Chen, Jie Xiao, Hao Ouyang, Kai Zhu, Yu Liu, Yujun Shen, Qifeng Chen, Ping Luo

TL;DR

MangaNinja tackles reference-guided line art colorization by learning local correspondences between a reference image and line art through a dual-branch diffusion framework equipped with progressive patch shuffling and PointNet-based fine-grained point guidance. The method enables robust, pixel-precise color transfer even under discrepant references or multi-reference scenarios, using a Reference U-Net to fuse reference features with a Denoising U-Net and a point-conditioned cross-attention scheme. Quantitative and qualitative results on a self-constructed anime benchmark show state-of-the-art performance in color fidelity and semantic consistency, while supporting interactive point control for challenging cases. The work also provides a standardized evaluation protocol and demonstrates clear practical value for accelerating colorization in the anime industry.

Abstract

Derived from diffusion models, MangaNinjia specializes in the task of reference-guided line art colorization. We incorporate two thoughtful designs to ensure precise character detail transcription, including a patch shuffling module to facilitate correspondence learning between the reference color image and the target line art, and a point-driven control scheme to enable fine-grained color matching. Experiments on a self-collected benchmark demonstrate the superiority of our model over current solutions in terms of precise colorization. We further showcase the potential of the proposed interactive point control in handling challenging cases, cross-character colorization, multi-reference harmonization, beyond the reach of existing algorithms.

MangaNinja: Line Art Colorization with Precise Reference Following

TL;DR

MangaNinja tackles reference-guided line art colorization by learning local correspondences between a reference image and line art through a dual-branch diffusion framework equipped with progressive patch shuffling and PointNet-based fine-grained point guidance. The method enables robust, pixel-precise color transfer even under discrepant references or multi-reference scenarios, using a Reference U-Net to fuse reference features with a Denoising U-Net and a point-conditioned cross-attention scheme. Quantitative and qualitative results on a self-constructed anime benchmark show state-of-the-art performance in color fidelity and semantic consistency, while supporting interactive point control for challenging cases. The work also provides a standardized evaluation protocol and demonstrates clear practical value for accelerating colorization in the anime industry.

Abstract

Derived from diffusion models, MangaNinjia specializes in the task of reference-guided line art colorization. We incorporate two thoughtful designs to ensure precise character detail transcription, including a patch shuffling module to facilitate correspondence learning between the reference color image and the target line art, and a point-driven control scheme to enable fine-grained color matching. Experiments on a self-collected benchmark demonstrate the superiority of our model over current solutions in terms of precise colorization. We further showcase the potential of the proposed interactive point control in handling challenging cases, cross-character colorization, multi-reference harmonization, beyond the reach of existing algorithms.
Paper Structure (16 sections, 3 equations, 6 figures, 2 tables)

This paper contains 16 sections, 3 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Visualization of point guidance. By introducing points as guidance, MangaNinja can tackle many challenging tasks, such as when there are significant variations between reference images and line art while preserving details. See more in \ref{['e3']}.
  • Figure 2: The training process of MangaNinja. We randomly select two frames from video data, using one frame as a reference image and extracting the line art from the other. Both frames are input into the Reference U-Net and the Denoising U-Net, respectively. To enhance the model's automatic matching and fine-grained control capabilities, we propose a series of training strategies, including progressive patch shuffling. Additionally, we employ an off-the-shelf model to extract matching points from the two frames, and these point maps are fed into the main branch through PointNet.
  • Figure 3: Qualitative comparisons. We compare our method with the state-of-the-art non-generative colorization method BasicPBC, the consistency generation method IP-Adapter, and AnyDoor. The results demonstrate that our method significantly outperforms them in terms of colorization accuracy and generated image quality. Notably, our method does not use points for guidance in the generated results.
  • Figure 4: Visualization of varying poses or missing details. With point guidance, MangaNinja can tackle many challenging cases. For instance, in the first two rows, there are significant variations between the reference image and line art. Furthermore, users can employ point guidance to colorize regions or elements with no matches in the reference; for example, the lower parts of the clothing are missing in the reference image of the third sample. When dealing with multiple objects, point guidance effectively prevents color confusion, as demonstrated in the last row.
  • Figure 5: Visualization of multi-ref colorization.MangaNinja enables users to select specific areas from multiple reference images through points, providing guidance for all elements in the line art. Additionally, it effectively resolves conflicts between similar visual elements across the reference images.
  • ...and 1 more figures