Table of Contents
Fetching ...

FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

Hmrishav Bandyopadhyay, Yi-Zhe Song

TL;DR

FlipSketch is presented, a system that brings back the magic of flip-book animation – just draw your idea and describe how you want it to move, while maintaining the artistic essence of hand-drawn animation.

Abstract

Sketch animations offer a powerful medium for visual storytelling, from simple flip-book doodles to professional studio productions. While traditional animation requires teams of skilled artists to draw key frames and in-between frames, existing automation attempts still demand significant artistic effort through precise motion paths or keyframe specification. We present FlipSketch, a system that brings back the magic of flip-book animation -- just draw your idea and describe how you want it to move! Our approach harnesses motion priors from text-to-video diffusion models, adapting them to generate sketch animations through three key innovations: (i) fine-tuning for sketch-style frame generation, (ii) a reference frame mechanism that preserves visual integrity of input sketch through noise refinement, and (iii) a dual-attention composition that enables fluid motion without losing visual consistency. Unlike constrained vector animations, our raster frames support dynamic sketch transformations, capturing the expressive freedom of traditional animation. The result is an intuitive system that makes sketch animation as simple as doodling and describing, while maintaining the artistic essence of hand-drawn animation.

FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

TL;DR

FlipSketch is presented, a system that brings back the magic of flip-book animation – just draw your idea and describe how you want it to move, while maintaining the artistic essence of hand-drawn animation.

Abstract

Sketch animations offer a powerful medium for visual storytelling, from simple flip-book doodles to professional studio productions. While traditional animation requires teams of skilled artists to draw key frames and in-between frames, existing automation attempts still demand significant artistic effort through precise motion paths or keyframe specification. We present FlipSketch, a system that brings back the magic of flip-book animation -- just draw your idea and describe how you want it to move! Our approach harnesses motion priors from text-to-video diffusion models, adapting them to generate sketch animations through three key innovations: (i) fine-tuning for sketch-style frame generation, (ii) a reference frame mechanism that preserves visual integrity of input sketch through noise refinement, and (iii) a dual-attention composition that enables fluid motion without losing visual consistency. Unlike constrained vector animations, our raster frames support dynamic sketch transformations, capturing the expressive freedom of traditional animation. The result is an intuitive system that makes sketch animation as simple as doodling and describing, while maintaining the artistic essence of hand-drawn animation.

Paper Structure

This paper contains 20 sections, 7 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Model Overview: (i) During setup, we invert the input sketch to act as the reference noise for the first frame, sampling from a standard normal for the rest. (ii) For timesteps within threshold $\tau_1$, we iteratively refine sampled noise for our reference noise (first frame) is denoised to the input sketch. (iii) We further compose attention maps for joint denoising of reference and sampled noise to influence all frames with first-frame information.
  • Figure 2: We parallelly perform denoising of reference noise $x^r_t$ and that of all frames $f^i_t$. Query-key pairs from reference frame denoising ($q^r_t, k^r_t$) are used to influence video generation through cross-attention with ($q^g_t, k^g_t$).
  • Figure 3: Time and compute needs of Live-Sketch gal2024breathing and our method for increasing number of strokes and frames respectively.
  • Figure 4: Qualitative comparison of our method against vector animation algorithm Live-Sketch gal2024breathing and raster video generation methods SVD blattmann2023stable and DynamiCrafter (DC) xing2023dynamicrafter. Live-Sketch gal2024breathing preserves sketch identity by constraining local animations between vectors, but has limited motion capacity. SVD blattmann2023stable and DC xing2023dynamicrafter cannot preserve sketch identity, suffering from sketch-photo domain gap. Our method performs dynamic animations that align with text prompts, without losing sketch identity.
  • Figure 5: Frame extrapolation allows us to construct complex animations by stitching multiple videos with different text prompts.
  • ...and 3 more figures