AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process
Junjie Hu, Shuyong Gao, Qianyu Guo, Yan Wang, Qishan Wang, Yuang Feng, Wenqiang Zhang
TL;DR
AnimatePainter tackles the challenge of generating painting processes from arbitrary images without real drawing data by reframing the task as video generation. It combines a self-supervised data synthesis pipeline with a depth-guided, diffusion-based video generator, enhanced by a DF-Encoder that injects hierarchical depth information into cross-attention. Key contributions include a scalable self-supervised data generation method, depth-guided layering for process planning, and an end-to-end painting generator that produces coherent, human-like painting sequences. The approach yields realistic process videos across painting styles and demonstrates strong performance against baselines, offering a practical pathway for education, robotic painting, and creative AI applications where real process data is scarce.
Abstract
Humans can intuitively decompose an image into a sequence of strokes to create a painting, yet existing methods for generating drawing processes are limited to specific data types and often rely on expensive human-annotated datasets. We propose a novel self-supervised framework for generating drawing processes from any type of image, treating the task as a video generation problem. Our approach reverses the drawing process by progressively removing strokes from a reference image, simulating a human-like creation sequence. Crucially, our method does not require costly datasets of real human drawing processes; instead, we leverage depth estimation and stroke rendering to construct a self-supervised dataset. We model human drawings as "refinement" and "layering" processes and introduce depth fusion layers to enable video generation models to learn and replicate human drawing behavior. Extensive experiments validate the effectiveness of our approach, demonstrating its ability to generate realistic drawings without the need for real drawing process data.
