Table of Contents
Fetching ...

NECromancer: Breathing Life into Skeletons via BVH Animation

Mingxi Xu, Qi Wang, Zhengyu Wen, Phong Dao Thien, Zhengyu Li, Ning Zhang, Xiaoyu He, Wei Zhao, Kehong Gong, Mingyuan Zhang

TL;DR

This work addresses the challenge of universal motion understanding across diverse morphologies by introducing NECromancer, a topology-invariant BVH motion tokenizer. It combines OwO, a graph-based, ontology-aware skeletal encoder, with TAT, a topology-agnostic tokenizer that relies on a virtual joint and RVQ to generate discrete tokens independent of skeleton topology, all trained and evaluated on the Unified BVH Universe (UvU) dataset of 47,807 sequences. The approach enables high-fidelity reconstruction under compression, cross-skeleton motion transfer, and text–motion retrieval, demonstrating robust generalization across humans, animals, and fantasy morphologies. The combined framework promises a scalable, cross-species foundation for 4D animation and motion synthesis, with potential applications in cross-domain content creation and robotics, while acknowledging computational and data richness challenges.

Abstract

Motion tokenization is a key component of generalizable motion models, yet most existing approaches are restricted to species-specific skeletons, limiting their applicability across diverse morphologies. We propose NECromancer (NEC), a universal motion tokenizer that operates directly on arbitrary BVH skeletons. NEC consists of three components: (1) an Ontology-aware Skeletal Graph Encoder (OwO) that encodes structural priors from BVH files, including joint semantics, rest-pose offsets, and skeletal topology, into skeletal embeddings; (2) a Topology-Agnostic Tokenizer (TAT) that compresses motion sequences into a universal, topology-invariant discrete representation; and (3) the Unified BVH Universe (UvU), a large-scale dataset aggregating BVH motions across heterogeneous skeletons. Experiments show that NEC achieves high-fidelity reconstruction under substantial compression and effectively disentangles motion from skeletal structure. The resulting token space supports cross-species motion transfer, composition, denoising, generation with token-based models, and text-motion retrieval, establishing a unified framework for motion analysis and synthesis across diverse morphologies. Demo page: https://animotionlab.github.io/NECromancer/

NECromancer: Breathing Life into Skeletons via BVH Animation

TL;DR

This work addresses the challenge of universal motion understanding across diverse morphologies by introducing NECromancer, a topology-invariant BVH motion tokenizer. It combines OwO, a graph-based, ontology-aware skeletal encoder, with TAT, a topology-agnostic tokenizer that relies on a virtual joint and RVQ to generate discrete tokens independent of skeleton topology, all trained and evaluated on the Unified BVH Universe (UvU) dataset of 47,807 sequences. The approach enables high-fidelity reconstruction under compression, cross-skeleton motion transfer, and text–motion retrieval, demonstrating robust generalization across humans, animals, and fantasy morphologies. The combined framework promises a scalable, cross-species foundation for 4D animation and motion synthesis, with potential applications in cross-domain content creation and robotics, while acknowledging computational and data richness challenges.

Abstract

Motion tokenization is a key component of generalizable motion models, yet most existing approaches are restricted to species-specific skeletons, limiting their applicability across diverse morphologies. We propose NECromancer (NEC), a universal motion tokenizer that operates directly on arbitrary BVH skeletons. NEC consists of three components: (1) an Ontology-aware Skeletal Graph Encoder (OwO) that encodes structural priors from BVH files, including joint semantics, rest-pose offsets, and skeletal topology, into skeletal embeddings; (2) a Topology-Agnostic Tokenizer (TAT) that compresses motion sequences into a universal, topology-invariant discrete representation; and (3) the Unified BVH Universe (UvU), a large-scale dataset aggregating BVH motions across heterogeneous skeletons. Experiments show that NEC achieves high-fidelity reconstruction under substantial compression and effectively disentangles motion from skeletal structure. The resulting token space supports cross-species motion transfer, composition, denoising, generation with token-based models, and text-motion retrieval, establishing a unified framework for motion analysis and synthesis across diverse morphologies. Demo page: https://animotionlab.github.io/NECromancer/
Paper Structure (52 sections, 1 theorem, 29 equations, 6 figures, 5 tables)

This paper contains 52 sections, 1 theorem, 29 equations, 6 figures, 5 tables.

Key Result

Theorem 1

If a model can correctly determine the LCA for any pair of nodes $(i, j)$ in a tree, then the entire tree topology can be uniquely reconstructed.

Figures (6)

  • Figure 1: Overview of the Unified BVH Universe dataset pipeline. Motion data from three existing datasets are unified into a standardized representation, including BVH files, base-pose meshes, skinning weights, and text annotations. Data filtering and smoothing are applied to ensure physical plausibility. During training, on-the-fly augmentations are used to further increase data diversity.
  • Figure 2: Overview of NECromancer (NEC). NEC consists of two main components: (a) Ontology-aware Skeletal Graph Encoder (OwO), which encodes static skeletal information (topology, joint names, rest pose) into structured graph-based joint features;(b) Topology-Agnostic Tokenizer (TAT), including Spatio-Temporal Encoder and Decoder, which maps motion sequences into a unified feature space, appends virtual joints, and converts them into discrete motion tokens.
  • Figure 3: Qualitative reconstruction results comparing NEC with ground truth on Objaverse-XL and Truebones.
  • Figure 4: Overview of the BVH motion format. BVH encodes a skeleton as a joint hierarchy with fixed rest-pose offsets (OFFSET), and represents motion as per-frame channels (TRANSLATION for the root and ROTATION for all joints) applied in hierarchical order.
  • Figure 5: Cross-skeleton motion distance correlation under topology transfer. Each dot corresponds to a pairwise motion distance computed on the source skeleton (x-axis) and the corresponding distance after retargeting to a target skeleton (y-axis). Strong positive correlations (Pearson $r$) indicate that NEC preserves motion semantics under cross-topology transfer.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem
  • proof : Constructive proof