DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
Kazuma Nagata, Naoshi Kaneko
TL;DR
DACoN tackles automatic anime line-art colorization by leveraging DINOv2's part-level semantics fused with CNN spatial features to propagate colors from any number of references. The method uses segment-wise pooling, multi-reference fusion, and a dual loss that includes a DINO-guided feature consistency term, enabling robust keyframe and consecutive-frame colorization with one model. Empirical results on PaintBucket-Character show DACoN outperforms prior segment- and generation-based approaches, with clearer detail preservation and better handling of occlusions and viewpoint changes; increasing reference images yields robust gains. The approach offers practical benefits for production pipelines by enabling flexible multi-reference colorization and reducing the need for dedicated per-character models, albeit with higher memory requirements due to large foundation-model features.
Abstract
Automatic colorization of line drawings has been widely studied to reduce the labor cost of hand-drawn anime production. Deep learning approaches, including image/video generation and feature-based correspondence, have improved accuracy but struggle with occlusions, pose variations, and viewpoint changes. To address these challenges, we propose DACoN, a framework that leverages foundation models to capture part-level semantics, even in line drawings. Our method fuses low-resolution semantic features from foundation models with high-resolution spatial features from CNNs for fine-grained yet robust feature extraction. In contrast to previous methods that rely on the Multiplex Transformer and support only one or two reference images, DACoN removes this constraint, allowing any number of references. Quantitative and qualitative evaluations demonstrate the benefits of using multiple reference images, achieving superior colorization performance. Our code and model are available at https://github.com/kzmngt/DACoN.
