How Animals Dance (When You're Not Looking)
Xiaojuan Wang, Aleksander Holynski, Brian Curless, Ira Kemelmacher, Steve Seitz
TL;DR
The paper addresses the challenge of generating long, music-synchronized, structured animal dances, which existing models struggle to achieve without extensive animal-specific motion data. It introduces choreography patterns as a high-level control signal, augments a small set of input keyframes with mirrored poses, and employs a graph-optimization framework to map beat-aligned motion segments to keyframe pairs before producing the final video via diffusion and beat-warping. The approach combines choreography pattern extraction from human dances, a directed keyframe-pair graph, mirror-aware keyframe augmentation, and beat-aligned video synthesis to deliver up to 30 seconds of animal dance across many species. A user study and quantitative evaluations show improvements in appearance and visual quality over baselines, highlighting the method's practical potential for entertainment and zoological analysis alike.
Abstract
We present a framework for generating music-synchronized, choreography aware animal dance videos. Our framework introduces choreography patterns -- structured sequences of motion beats that define the long-range structure of a dance -- as a novel high-level control signal for dance video generation. These patterns can be automatically estimated from human dance videos. Starting from a few keyframes representing distinct animal poses, generated via text-to-image prompting or GPT-4o, we formulate dance synthesis as a graph optimization problem that seeks the optimal keyframe structure to satisfy a specified choreography pattern of beats. We also introduce an approach for mirrored pose image generation, essential for capturing symmetry in dance. In-between frames are synthesized using an video diffusion model. With as few as six input keyframes, our method can produce up to 30 seconds dance videos across a wide range of animals and music tracks.
