Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

Zhongjie Duan; Chengyu Wang; Cen Chen; Weining Qian; Jun Huang

Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

Zhongjie Duan, Chengyu Wang, Cen Chen, Weining Qian, Jun Huang

TL;DR

Diffutoon presents a diffusion-model-based framework for high-resolution toon shading of photorealistic videos into anime style. It decomposes toon shading into four subproblems and introduces a dual-pipeline architecture: a main toon shading stream and an editing branch, enabling both high-quality rendering of long videos and prompt-driven content edits. The method leverages ControlNet signals for outlines and colors, AnimateDiff-based motion modules for temporal coherence, and a sliding-window latent-diffusion strategy with memory-efficient flash attention to achieve resolutions up to $1536\times1536$. Experimental results on a 10-video dataset, including human evaluations, show clear improvements over baselines in aesthetics and consistency, with qualitative case studies and ablations supporting design choices. The work provides code and example videos on Github, highlighting practical impact for high-fidelity NPR rendering and controllable video editing.

Abstract

Toon shading is a type of non-photorealistic rendering task of animation. Its primary purpose is to render objects with a flat and stylized appearance. As diffusion models have ascended to the forefront of image synthesis methodologies, this paper delves into an innovative form of toon shading based on diffusion models, aiming to directly render photorealistic videos into anime styles. In video stylization, extant methods encounter persistent challenges, notably in maintaining consistency and achieving high visual quality. In this paper, we model the toon shading problem as four subproblems: stylization, consistency enhancement, structure guidance, and colorization. To address the challenges in video stylization, we propose an effective toon shading approach called \textit{Diffutoon}. Diffutoon is capable of rendering remarkably detailed, high-resolution, and extended-duration videos in anime style. It can also edit the content according to prompts via an additional branch. The efficacy of Diffutoon is evaluated through quantitive metrics and human evaluation. Notably, Diffutoon surpasses both open-source and closed-source baseline approaches in our experiments. Our work is accompanied by the release of both the source code and example videos on Github (Project page: https://ecnu-cilab.github.io/DiffutoonProjectPage/).

Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

TL;DR

. Experimental results on a 10-video dataset, including human evaluations, show clear improvements over baselines in aesthetics and consistency, with qualitative case studies and ablations supporting design choices. The work provides code and example videos on Github, highlighting practical impact for high-fidelity NPR rendering and controllable video editing.

Abstract

Paper Structure (18 sections, 8 equations, 6 figures, 4 tables)

This paper contains 18 sections, 8 equations, 6 figures, 4 tables.

Introduction
Related Work
Stable Diffusion
Fast Sampling of Diffusion Models
Controllable Image Synthesis
Temporal Diffusion Models
Post-Processing Methods
Methodology
Toon Shading
Adding Editing Signals to Toon Shading
Synthesizing High-Resolution Long Videos
Experiments
Comparison with Baseline Methods
Case Study
Ablation Study
...and 3 more sections

Figures (6)

Figure 1: The overall architecture of Diffutoon, where the top part is the main toon shading pipeline, and the bottom part is the editing branch. The editing branch can generate editing signals in the format of color video for the main toon shading pipeline.
Figure 2: Visual comparison with other methods. The prompt used for editing is "best quality, perfect anime illustration, a girl is dancing, smile, solo, rgb]0.9,0.9,0.9 rgb]0.98,0.51,0.12orange dress, rgb]0.10,0.10,0.10black hair, rgb]1.00,1.00,1.00white shoes, rgb]0.46,0.71,0.78blue sky ". Since the resolution of our generated video is extremely high, we enlarge some areas to view details. We highly recommend readers to see the videos on our project page.
Figure 3: Intermediate results of Diffutoon. In the main toon shading pipeline, the video is synthesized according to the outline video and the color video. When the editing branch is enabled, the generated color video contains the editing signals.
Figure 4: Video rendered without outline information.
Figure 5: Video rendered without color information.
...and 1 more figures

Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

TL;DR

Abstract

Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)