Table of Contents
Fetching ...

One Model to Rig Them All: Diverse Skeleton Rigging with UniRig

Jia-Peng Zhang, Cheng-Feng Pu, Meng-Hao Guo, Yan-Pei Cao, Shi-Min Hu

TL;DR

UniRig tackles the challenge of rigging a highly diverse set of 3D models by introducing a unified, two-stage framework that first autoregressively generates a topologically valid skeleton tree and then predicts skinning weights conditioned on that skeleton. A novel Skeleton Tree Tokenization encodes hierarchical bone relationships and bone types into compact sequences, enabling a robust skeleton generation process, while a Bone-Point Cross Attention module links the generated skeleton to mesh geometry for accurate skinning. The approach is trained on Rig-XL, a large-scale diverse rigged-model dataset, and refined with VRoid data to handle fine-grained character details, achieving state-of-the-art results on skeleton and weight prediction and demonstrating strong mesh-deformation robustness under animation. The practical impact includes accelerated auto-rigging workflows, human-in-the-loop editing capabilities, and reliable animation for a broad range of object categories, from anime characters to complex organisms, with potential applications in VTubing and game production. Future work points to broader modality inputs and richer physical simulations to further enhance realism and generalization.

Abstract

The rapid evolution of 3D content creation, encompassing both AI-powered methods and traditional workflows, is driving an unprecedented demand for automated rigging solutions that can keep pace with the increasing complexity and diversity of 3D models. We introduce UniRig, a novel, unified framework for automatic skeletal rigging that leverages the power of large autoregressive models and a bone-point cross-attention mechanism to generate both high-quality skeletons and skinning weights. Unlike previous methods that struggle with complex or non-standard topologies, UniRig accurately predicts topologically valid skeleton structures thanks to a new Skeleton Tree Tokenization method that efficiently encodes hierarchical relationships within the skeleton. To train and evaluate UniRig, we present Rig-XL, a new large-scale dataset of over 14,000 rigged 3D models spanning a wide range of categories. UniRig significantly outperforms state-of-the-art academic and commercial methods, achieving a 215% improvement in rigging accuracy and a 194% improvement in motion accuracy on challenging datasets. Our method works seamlessly across diverse object categories, from detailed anime characters to complex organic and inorganic structures, demonstrating its versatility and robustness. By automating the tedious and time-consuming rigging process, UniRig has the potential to speed up animation pipelines with unprecedented ease and efficiency. Project Page: https://zjp-shadow.github.io/works/UniRig/

One Model to Rig Them All: Diverse Skeleton Rigging with UniRig

TL;DR

UniRig tackles the challenge of rigging a highly diverse set of 3D models by introducing a unified, two-stage framework that first autoregressively generates a topologically valid skeleton tree and then predicts skinning weights conditioned on that skeleton. A novel Skeleton Tree Tokenization encodes hierarchical bone relationships and bone types into compact sequences, enabling a robust skeleton generation process, while a Bone-Point Cross Attention module links the generated skeleton to mesh geometry for accurate skinning. The approach is trained on Rig-XL, a large-scale diverse rigged-model dataset, and refined with VRoid data to handle fine-grained character details, achieving state-of-the-art results on skeleton and weight prediction and demonstrating strong mesh-deformation robustness under animation. The practical impact includes accelerated auto-rigging workflows, human-in-the-loop editing capabilities, and reliable animation for a broad range of object categories, from anime characters to complex organisms, with potential applications in VTubing and game production. Future work points to broader modality inputs and richer physical simulations to further enhance realism and generalization.

Abstract

The rapid evolution of 3D content creation, encompassing both AI-powered methods and traditional workflows, is driving an unprecedented demand for automated rigging solutions that can keep pace with the increasing complexity and diversity of 3D models. We introduce UniRig, a novel, unified framework for automatic skeletal rigging that leverages the power of large autoregressive models and a bone-point cross-attention mechanism to generate both high-quality skeletons and skinning weights. Unlike previous methods that struggle with complex or non-standard topologies, UniRig accurately predicts topologically valid skeleton structures thanks to a new Skeleton Tree Tokenization method that efficiently encodes hierarchical relationships within the skeleton. To train and evaluate UniRig, we present Rig-XL, a new large-scale dataset of over 14,000 rigged 3D models spanning a wide range of categories. UniRig significantly outperforms state-of-the-art academic and commercial methods, achieving a 215% improvement in rigging accuracy and a 194% improvement in motion accuracy on challenging datasets. Our method works seamlessly across diverse object categories, from detailed anime characters to complex organic and inorganic structures, demonstrating its versatility and robustness. By automating the tedious and time-consuming rigging process, UniRig has the potential to speed up animation pipelines with unprecedented ease and efficiency. Project Page: https://zjp-shadow.github.io/works/UniRig/

Paper Structure

This paper contains 44 sections, 11 equations, 14 figures, 10 tables, 2 algorithms.

Figures (14)

  • Figure 1: Examples from Rig-XL , demonstrating well-defined skeleton structures.
  • Figure 2: Category distribution of Rig-XL . The percentages indicate the proportion of models belonging to each category.
  • Figure 3: Distribution of bone numbers in Rig-XL . The histogram shows the frequency of different bone counts across all models in the dataset.
  • Figure 4: Overview of the UniRig framework. The framework consists of two main stages: (a) Skeleton Tree Prediction and (b) Skin Weight Prediction. (a) The skeleton prediction stage (detailed in Section \ref{['sec:ar-skeleton']}) takes a point cloud sampled from the 3D meshes as input, which is first processed by the Shape Encoder to extract geometric features. These features, along with optional class information, are then fed into an autoregressive Skeleton Tree GPT to generate a token sequence representing the skeleton tree. The token sequence is then decoded into a hierarchical skeleton structure. (b) The skin weight prediction stage (detailed in Section \ref{['sec:skin_pred']}) takes the predicted skeleton tree from (a) and the point cloud as input. A Point-wise Encoder extracts features from the point cloud, while a Bone Encoder processes the skeleton tree. These features are then combined using a Bone-Point Cross Attention mechanism to predict the skinning weights and bone attributes. Finally, the predicted rig can be used to animate the mesh. $\copyright$ kinoko7
  • Figure 5: Comparison of model animation with and without spring bones. The model on the left utilizes spring bones, resulting in more natural and dynamic movement of the hair and skirt. The model on the right does not use spring bones, leading to a stiffer and less realistic appearance, with only rigid body motion.
  • ...and 9 more figures