MotionDreamer: Exploring Semantic Video Diffusion features for Zero-Shot 3D Mesh Animation
Lukas Uzolas, Elmar Eisemann, Petr Kellnhofer
TL;DR
<3-5 sentence high-level summary> MotionDreamer addresses the challenge of re-animating unseen 3D meshes without target-domain training by leveraging semantic features from pre-trained video diffusion models to guide pose fitting on explicit mesh representations. The approach textures a given mesh, generates a motion sequence via a VDM-conditioned rendering, and optimizes per-frame pose offsets by matching semantic diffusion features across frames. Evaluations across two VDM backbones and four animation models show favorable motion quality in a user study and competitive pose-fitting accuracy with reduced runtime compared to end-to-end 4D methods. The work enables fast, zero-shot re-animation of diverse assets within standard graphics pipelines and opens avenues for diffusion-guided motion analysis and asset authoring.
Abstract
Animation techniques bring digital 3D worlds and characters to life. However, manual animation is tedious and automated techniques are often specialized to narrow shape classes. In our work, we propose a technique for automatic re-animation of various 3D shapes based on a motion prior extracted from a video diffusion model. Unlike existing 4D generation methods, we focus solely on the motion, and we leverage an explicit mesh-based representation compatible with existing computer-graphics pipelines. Furthermore, our utilization of diffusion features enhances accuracy of our motion fitting. We analyze efficacy of these features for animation fitting and we experimentally validate our approach for two different diffusion models and four animation models. Finally, we demonstrate that our time-efficient zero-shot method achieves a superior performance re-animating a diverse set of 3D shapes when compared to existing techniques in a user study. The project website is located at https://lukas.uzolas.com/MotionDreamer.
