Table of Contents
Fetching ...

DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images

Kazuma Nagata, Naoshi Kaneko

TL;DR

DACoN tackles automatic anime line-art colorization by leveraging DINOv2's part-level semantics fused with CNN spatial features to propagate colors from any number of references. The method uses segment-wise pooling, multi-reference fusion, and a dual loss that includes a DINO-guided feature consistency term, enabling robust keyframe and consecutive-frame colorization with one model. Empirical results on PaintBucket-Character show DACoN outperforms prior segment- and generation-based approaches, with clearer detail preservation and better handling of occlusions and viewpoint changes; increasing reference images yields robust gains. The approach offers practical benefits for production pipelines by enabling flexible multi-reference colorization and reducing the need for dedicated per-character models, albeit with higher memory requirements due to large foundation-model features.

Abstract

Automatic colorization of line drawings has been widely studied to reduce the labor cost of hand-drawn anime production. Deep learning approaches, including image/video generation and feature-based correspondence, have improved accuracy but struggle with occlusions, pose variations, and viewpoint changes. To address these challenges, we propose DACoN, a framework that leverages foundation models to capture part-level semantics, even in line drawings. Our method fuses low-resolution semantic features from foundation models with high-resolution spatial features from CNNs for fine-grained yet robust feature extraction. In contrast to previous methods that rely on the Multiplex Transformer and support only one or two reference images, DACoN removes this constraint, allowing any number of references. Quantitative and qualitative evaluations demonstrate the benefits of using multiple reference images, achieving superior colorization performance. Our code and model are available at https://github.com/kzmngt/DACoN.

DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images

TL;DR

DACoN tackles automatic anime line-art colorization by leveraging DINOv2's part-level semantics fused with CNN spatial features to propagate colors from any number of references. The method uses segment-wise pooling, multi-reference fusion, and a dual loss that includes a DINO-guided feature consistency term, enabling robust keyframe and consecutive-frame colorization with one model. Empirical results on PaintBucket-Character show DACoN outperforms prior segment- and generation-based approaches, with clearer detail preservation and better handling of occlusions and viewpoint changes; increasing reference images yields robust gains. The approach offers practical benefits for production pipelines by enabling flexible multi-reference colorization and reducing the need for dedicated per-character models, albeit with higher memory requirements due to large foundation-model features.

Abstract

Automatic colorization of line drawings has been widely studied to reduce the labor cost of hand-drawn anime production. Deep learning approaches, including image/video generation and feature-based correspondence, have improved accuracy but struggle with occlusions, pose variations, and viewpoint changes. To address these challenges, we propose DACoN, a framework that leverages foundation models to capture part-level semantics, even in line drawings. Our method fuses low-resolution semantic features from foundation models with high-resolution spatial features from CNNs for fine-grained yet robust feature extraction. In contrast to previous methods that rely on the Multiplex Transformer and support only one or two reference images, DACoN removes this constraint, allowing any number of references. Quantitative and qualitative evaluations demonstrate the benefits of using multiple reference images, achieving superior colorization performance. Our code and model are available at https://github.com/kzmngt/DACoN.

Paper Structure

This paper contains 21 sections, 4 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: DACoN can colorize both keyframes, even when their compositions differ significantly from the reference images, and inbetweens that interpolate between keyframes, all with a single model.
  • Figure 2: Visualization of DINO features. The figure shows two pairs of images. In each pair, the left image is the input, and the right image is the PCA visualization of DINO features after background removal.
  • Figure 3: Overview of DACoN pipeline. Each image is individually input into the model to extract segment features. Since the model does not reference other images, there is no limit on the number of images used for segment correspondence.
  • Figure 4: Visual comparisons of keyframe colorization results between our method and other methods.
  • Figure 5: Post-processing of colorization results for generation-based methods. (a) Resize to the original image size. (b) Replacing each pixel with the nearest color from the reference image. (c) Unifying the color to the most frequent one within each segment.
  • ...and 7 more figures