Table of Contents
Fetching ...

AniDoc: Animation Creation Made Easier

Yihao Meng, Hao Ouyang, Hanlin Wang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Zhiheng Liu, Yujun Shen, Huamin Qu

TL;DR

This work tackles automating colorization and in-betweening in 2D anime production by introducing AniDoc, a diffusion-based all-in-one model that uses explicit reference–sketch correspondence to colorize line-art sequences with temporal coherence. It combines correspondence-guided colorization, binarized sketch conditioning, background augmentation, and sparse-sketch training within a Stable Video Diffusion framework to enable high-fidelity colorization from a single character design. Across extensive experiments on Sakuga-42M, AniDoc achieves superior quantitative and qualitative results over state-of-the-art baselines, including robust identity preservation and effective interpolation with sparse inputs. The approach promises practical impact by fitting into existing production pipelines and reducing manual coloring and in-betweening labor for anime and digital art.

Abstract

The production of 2D animation follows an industry-standard workflow, encompassing four essential stages: character design, keyframe animation, in-betweening, and coloring. Our research focuses on reducing the labor costs in the above process by harnessing the potential of increasingly powerful generative AI. Using video diffusion models as the foundation, AniDoc emerges as a video line art colorization tool, which automatically converts sketch sequences into colored animations following the reference character specification. Our model exploits correspondence matching as an explicit guidance, yielding strong robustness to the variations (e.g., posture) between the reference character and each line art frame. In addition, our model could even automate the in-betweening process, such that users can easily create a temporally consistent animation by simply providing a character image as well as the start and end sketches. Our code is available at: https://yihao-meng.github.io/AniDoc_demo.

AniDoc: Animation Creation Made Easier

TL;DR

This work tackles automating colorization and in-betweening in 2D anime production by introducing AniDoc, a diffusion-based all-in-one model that uses explicit reference–sketch correspondence to colorize line-art sequences with temporal coherence. It combines correspondence-guided colorization, binarized sketch conditioning, background augmentation, and sparse-sketch training within a Stable Video Diffusion framework to enable high-fidelity colorization from a single character design. Across extensive experiments on Sakuga-42M, AniDoc achieves superior quantitative and qualitative results over state-of-the-art baselines, including robust identity preservation and effective interpolation with sparse inputs. The approach promises practical impact by fitting into existing production pipelines and reducing manual coloring and in-betweening labor for anime and digital art.

Abstract

The production of 2D animation follows an industry-standard workflow, encompassing four essential stages: character design, keyframe animation, in-betweening, and coloring. Our research focuses on reducing the labor costs in the above process by harnessing the potential of increasingly powerful generative AI. Using video diffusion models as the foundation, AniDoc emerges as a video line art colorization tool, which automatically converts sketch sequences into colored animations following the reference character specification. Our model exploits correspondence matching as an explicit guidance, yielding strong robustness to the variations (e.g., posture) between the reference character and each line art frame. In addition, our model could even automate the in-betweening process, such that users can easily create a temporally consistent animation by simply providing a character image as well as the start and end sketches. Our code is available at: https://yihao-meng.github.io/AniDoc_demo.

Paper Structure

This paper contains 23 sections, 5 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Illustration of the workflow of 2D animation production.
  • Figure 2: Overview of AniDoc pipeline. We adopt a two-stage training strategy. In the dense-sketch training stage, we explicitly extract matching keypoints pairs between the reference image and each frame of the training video, constructing point maps to represent the correspondences. In the sparse-sketch training stage, we remove the intermediate frame sketches and use the matching points from the start and end frames to interpolate point trajectories, which guide the generation of the intermediate frames.
  • Figure 3: Illustration of color leakage issue in non-binarized sketch. For previous video colorization method huang2024lvcd, when given non-binarized sketch, even if the reference is an empty image, it can still generate colorized results with similar color pattern to the ground truth. After binarizing the sketch, the colorization results degrade significantly.
  • Figure 4: Visual comparison of reference-based colorization with four methods LVCD huang2024lvcd, LVCD+IP-Adapter ye2023ip, ID-animator he2024id, ToonCrafter xing2024tooncrafter.
  • Figure 5: Ablations on each component. "w/o matching" indicates without the corresponding matching module, "w/o binarize" indicates without binarization and background augmentation.
  • ...and 7 more figures