Table of Contents
Fetching ...

AnimaMimic: Imitating 3D Animation from Video Priors

Tianyi Xie, Yunuo Chen, Yaowei Guo, Yin Yang, Bolei Zhou, Demetri Terzopoulos, Ying Jiang, Chenfanfu Jiang

TL;DR

The paper addresses the labor-intensive nature of high-quality 3D animation and the lack of explicit 3D structure in 2D video diffusion outputs. It introduces AnimaMimic, which animates explicit 3D meshes by automatically generating a skeleton, supervising joint motion with monocular video priors via differentiable rendering, and refining deformation with differentiable soft-tissue dynamics. The approach combines rigging-based pose optimization with a physics-based refinement stage to ensure temporally coherent, physically plausible, and artist-editable motion. This framework enables smoother integration into standard animation pipelines and demonstrates improved motion fidelity and cross-view consistency over baselines.

Abstract

Creating realistic 3D animation remains a time-consuming and expertise-dependent process, requiring manual rigging, keyframing, and fine-tuning of complex motions. Meanwhile, video diffusion models have recently demonstrated remarkable motion imagination in 2D, generating dynamic and visually coherent motion from text or image prompts. However, their results lack explicit 3D structure and cannot be directly used for animation or simulation. We present AnimaMimic, a framework that animates static 3D meshes using motion priors learned from video diffusion models. Starting from an input mesh, AnimaMimic synthesizes a monocular animation video, automatically constructs a skeleton with skinning weights, and refines joint parameters through differentiable rendering and video-based supervision. To further enhance realism, we integrate a differentiable simulation module that refines mesh deformation through physically grounded soft-tissue dynamics. Our method bridges the creativity of video diffusion and the structural control of 3D rigged animation, producing physically plausible, temporally coherent, and artist-editable motion sequences that integrate seamlessly into standard animation pipelines. Our project page is at: https://xpandora.github.io/AnimaMimic/

AnimaMimic: Imitating 3D Animation from Video Priors

TL;DR

The paper addresses the labor-intensive nature of high-quality 3D animation and the lack of explicit 3D structure in 2D video diffusion outputs. It introduces AnimaMimic, which animates explicit 3D meshes by automatically generating a skeleton, supervising joint motion with monocular video priors via differentiable rendering, and refining deformation with differentiable soft-tissue dynamics. The approach combines rigging-based pose optimization with a physics-based refinement stage to ensure temporally coherent, physically plausible, and artist-editable motion. This framework enables smoother integration into standard animation pipelines and demonstrates improved motion fidelity and cross-view consistency over baselines.

Abstract

Creating realistic 3D animation remains a time-consuming and expertise-dependent process, requiring manual rigging, keyframing, and fine-tuning of complex motions. Meanwhile, video diffusion models have recently demonstrated remarkable motion imagination in 2D, generating dynamic and visually coherent motion from text or image prompts. However, their results lack explicit 3D structure and cannot be directly used for animation or simulation. We present AnimaMimic, a framework that animates static 3D meshes using motion priors learned from video diffusion models. Starting from an input mesh, AnimaMimic synthesizes a monocular animation video, automatically constructs a skeleton with skinning weights, and refines joint parameters through differentiable rendering and video-based supervision. To further enhance realism, we integrate a differentiable simulation module that refines mesh deformation through physically grounded soft-tissue dynamics. Our method bridges the creativity of video diffusion and the structural control of 3D rigged animation, producing physically plausible, temporally coherent, and artist-editable motion sequences that integrate seamlessly into standard animation pipelines. Our project page is at: https://xpandora.github.io/AnimaMimic/

Paper Structure

This paper contains 35 sections, 20 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Animated Creatures. By optimizing joint articulations and material parameters from videos, our method generates realistic dynamics for objects with diverse geometries.
  • Figure 2: Pipeline Overview. From an input 3D mesh, we render a canonical view and use a video diffusion model to generate a monocular motion sequence. We construct a skeleton with skinning weights using a feed-forward rigging model and generate animation by optimizing joint motions through differentiable rendering, tracking, and depth cues. Finally, we refine mesh deformation via differentiable simulation to obtain physically grounded and temporally consistent results. Right circles indicate novel views.
  • Figure 3: Qualitative Comparison. In comparison with SC4D wu2024sc4d, DreamMesh4D li2024dreammesh4d, and Puppeteer song2025puppeteer, our method yields more coherent motion trajectories and more accurately reflects the dynamics present in the reference videos.
  • Figure 4: Novel View Synthesis. Our method closely aligns with the reference video in the input view and produces coherent novel views, whereas the baseline methods deviate from the ground truth.
  • Figure 5: Ablation Studies. We conduct ablation studies on the proposed loss terms during optimization. Incorporating these terms leads to more plausible motion and enables the reconstructed dynamics to more faithfully adhere to the input video.
  • ...and 4 more figures