Table of Contents
Fetching ...

Articulated Kinematics Distillation from Video Diffusion Models

Xuan Li, Qianli Ma, Tsung-Yi Lin, Yongxin Chen, Chenfanfu Jiang, Ming-Yu Liu, Donglai Xiang

TL;DR

This paper introduces Articulated Kinematics Distillation (AKD), a framework that distills articulated motions from large video diffusion priors into low-DoF, skeleton-based representations for rigged 3D assets. By integrating 3D Gaussian Splatting with differentiable rendering and Score Distillation Sampling, AKD achieves high 3D shape consistency and expressive motion while remaining amenable to physics-based grounding via motion tracking. Key innovations include rigging transfer to Gaussian kernels, ground-aware rendering with a checkerboard plane, and a differentiable physics loop for grounding distilled motions in simulations. Experiments show superior 3D consistency, more plausible articulated motion, and favorable user preferences compared with text-to-4D baselines, demonstrating AKD’s potential for scalable, text-driven animation pipelines and robotics-relevant data generation.

Abstract

We present Articulated Kinematics Distillation (AKD), a framework for generating high-fidelity character animations by merging the strengths of skeleton-based animation and modern generative models. AKD uses a skeleton-based representation for rigged 3D assets, drastically reducing the Degrees of Freedom (DoFs) by focusing on joint-level control, which allows for efficient, consistent motion synthesis. Through Score Distillation Sampling (SDS) with pre-trained video diffusion models, AKD distills complex, articulated motions while maintaining structural integrity, overcoming challenges faced by 4D neural deformation fields in preserving shape consistency. This approach is naturally compatible with physics-based simulation, ensuring physically plausible interactions. Experiments show that AKD achieves superior 3D consistency and motion quality compared with existing works on text-to-4D generation. Project page: https://research.nvidia.com/labs/dir/akd/

Articulated Kinematics Distillation from Video Diffusion Models

TL;DR

This paper introduces Articulated Kinematics Distillation (AKD), a framework that distills articulated motions from large video diffusion priors into low-DoF, skeleton-based representations for rigged 3D assets. By integrating 3D Gaussian Splatting with differentiable rendering and Score Distillation Sampling, AKD achieves high 3D shape consistency and expressive motion while remaining amenable to physics-based grounding via motion tracking. Key innovations include rigging transfer to Gaussian kernels, ground-aware rendering with a checkerboard plane, and a differentiable physics loop for grounding distilled motions in simulations. Experiments show superior 3D consistency, more plausible articulated motion, and favorable user preferences compared with text-to-4D baselines, demonstrating AKD’s potential for scalable, text-driven animation pipelines and robotics-relevant data generation.

Abstract

We present Articulated Kinematics Distillation (AKD), a framework for generating high-fidelity character animations by merging the strengths of skeleton-based animation and modern generative models. AKD uses a skeleton-based representation for rigged 3D assets, drastically reducing the Degrees of Freedom (DoFs) by focusing on joint-level control, which allows for efficient, consistent motion synthesis. Through Score Distillation Sampling (SDS) with pre-trained video diffusion models, AKD distills complex, articulated motions while maintaining structural integrity, overcoming challenges faced by 4D neural deformation fields in preserving shape consistency. This approach is naturally compatible with physics-based simulation, ensuring physically plausible interactions. Experiments show that AKD achieves superior 3D consistency and motion quality compared with existing works on text-to-4D generation. Project page: https://research.nvidia.com/labs/dir/akd/

Paper Structure

This paper contains 41 sections, 17 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: By incorporating articulation into static assets, AKD synthesizes realistic motions distilled from large video diffusion models.
  • Figure 2: Pipeline. We novelly incorporate articulated skeletons into generative motion synthesis. With the low-dimensional parameterization of motions (a sequence of joint angles for articulated bones), the synthesis can focus more on motion modes instead of local-scale deformations. Given a text prompt, we use a text-to-3D method to generate a 3D asset. The asset is deformed by the skeleton and differentiably rendered into videos. The SDS gradient is evaluated by a pre-trained video diffusion transformer and backpropagated to joint angles.
  • Figure 3: Qualitative comparisons with TC4D. The blurry artifacts generated by TC4D are highlighted. TC4D often fails to produce alternating leg movements (e.g., in the astronaut example), or shows limited local-scale motion (e.g., in the T-Rex example).
  • Figure 4: Examples of our synthesized motions.
  • Figure 5: We use physics-based motion tracking to project synthesized motions onto physics-grounded trajectories.
  • ...and 6 more figures