Every Painting Awakened: A Training-free Framework for Painting-to-Animation Generation

Lingyu Liu; Yaxiong Wang; Li Zhu; Zhedong Zheng

Every Painting Awakened: A Training-free Framework for Painting-to-Animation Generation

Lingyu Liu, Yaxiong Wang, Li Zhu, Zhedong Zheng

TL;DR

This work tackles the problem of animating real-world paintings with image-to-video diffusion models that are traditionally trained on natural videos. It introduces a training-free framework that uses synthetic proxy images generated by powerful image diffusion models to guide text-driven motion while preserving painting fidelity. The core innovations are dual-path score distillation, which separately refines motion priors from real paintings and proxy proxies, and hybrid latent fusion via spherical linear interpolation to produce temporally coherent animations. The approach is plug-and-play, requiring no additional learnable parameters, and demonstrates consistent improvements across multiple I2V baselines in both fidelity and semantic alignment with text prompts. It also extends to natural images and offers insights into synthesis strategies and limitations, highlighting practical impact for digital art animation and related applications.

Abstract

We introduce a training-free framework specifically designed to bring real-world static paintings to life through image-to-video (I2V) synthesis, addressing the persistent challenge of aligning these motions with textual guidance while preserving fidelity to the original artworks. Existing I2V methods, primarily trained on natural video datasets, often struggle to generate dynamic outputs from static paintings. It remains challenging to generate motion while maintaining visual consistency with real-world paintings. This results in two distinct failure modes: either static outputs due to limited text-based motion interpretation or distorted dynamics caused by inadequate alignment with real-world artistic styles. We leverage the advanced text-image alignment capabilities of pre-trained image models to guide the animation process. Our approach introduces synthetic proxy images through two key innovations: (1) Dual-path score distillation: We employ a dual-path architecture to distill motion priors from both real and synthetic data, preserving static details from the original painting while learning dynamic characteristics from synthetic frames. (2) Hybrid latent fusion: We integrate hybrid features extracted from real paintings and synthetic proxy images via spherical linear interpolation in the latent space, ensuring smooth transitions and enhancing temporal consistency. Experimental evaluations confirm that our approach significantly improves semantic alignment with text prompts while faithfully preserving the unique characteristics and integrity of the original paintings. Crucially, by achieving enhanced dynamic effects without requiring any model training or learnable parameters, our framework enables plug-and-play integration with existing I2V methods, making it an ideal solution for animating real-world paintings. More animated examples can be found on our project website.

Every Painting Awakened: A Training-free Framework for Painting-to-Animation Generation

TL;DR

Abstract

Every Painting Awakened: A Training-free Framework for Painting-to-Animation Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)