SMF: Template-free and Rig-free Animation Transfer using Kinetic Codes
Sanjeev Muralikrishnan, Niladri Shekhar Dutt, Niloy J. Mitra
TL;DR
SMF tackles the challenge of transferring coarse motion signals to dense 3D character meshes without relying on templates or deformation rigs. It introduces Kinetic Codes, a temporally-aware latent space learned from sparse motion via a multi-headed attention autoencoder, and couples this with spatial and temporal gradient predictors and a differentiable Poisson solver to produce temporally coherent mesh sequences from a rest shape $X_0$. Temporal coherence is further enforced by an Augmented Neural ODE that predicts corrective Jacobians over motion windows, enabling robust long-sequence animation. Across AMASS, Mixamo, D4D, and monocular video, SMF demonstrates strong generalization to unseen motions and shapes, achieving state-of-the-art results on AMASS and showing realistic transfers to stylized and non-human characters, with potential for real-time applications.
Abstract
Animation retargetting applies sparse motion description (e.g., keypoint sequences) to a character mesh to produce a semantically plausible and temporally coherent full-body mesh sequence. Existing approaches come with restrictions -- they require access to template-based shape priors or artist-designed deformation rigs, suffer from limited generalization to unseen motion and/or shapes, or exhibit motion jitter. We propose Self-supervised Motion Fields (SMF), a self-supervised framework that is trained with only sparse motion representations, without requiring dataset-specific annotations, templates, or rigs. At the heart of our method are Kinetic Codes, a novel autoencoder-based sparse motion encoding, that exposes a semantically rich latent space, simplifying large-scale training. Our architecture comprises dedicated spatial and temporal gradient predictors, which are jointly trained in an end-to-end fashion. The combined network, regularized by the Kinetic Codes' latent space, has good generalization across both unseen shapes and new motions. We evaluated our method on unseen motion sampled from AMASS, D4D, Mixamo, and raw monocular video for animation transfer on various characters with varying shapes and topology. We report a new SoTA on the AMASS dataset in the context of generalization to unseen motion. Code, weights, and supplementary are available on the project webpage at https://motionfields.github.io/
