Table of Contents
Fetching ...

RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets

Isabella Liu, Zhan Xu, Wang Yifan, Hao Tan, Zexiang Xu, Xiaolong Wang, Hao Su, Zifan Shi

TL;DR

RigAnything tackles automatic rigging for arbitrary 3D assets without predefined templates by modeling skeletons as BFS-ordered sequences of 3D joints with parent indices and learning skinning weights in a unified autoregressive transformer framework. It combines a diffusion-based joint sampler with a hybrid-attention transformer that processes both global shape context and evolving skeleton structure, and predicts connectivity and skinning weights in tandem. Trained end-to-end on RigNet and a filtered Objaverse subset, RigAnything achieves state-of-the-art results across humanoids, quadrupeds, marine life, insects, and other categories, while delivering rigging in under a few seconds per shape. The approach enhances generalizability, robustness, and efficiency for auto-rigging, enabling scalable pipelines for interactive 3D content creation.

Abstract

We present RigAnything, a novel autoregressive transformer-based model, which makes 3D assets rig-ready by probabilistically generating joints and skeleton topologies and assigning skinning weights in a template-free manner. Unlike most existing auto-rigging methods, which rely on predefined skeleton templates and are limited to specific categories like humanoid, RigAnything approaches the rigging problem in an autoregressive manner, iteratively predicting the next joint based on the global input shape and the previous prediction. While autoregressive models are typically used to generate sequential data, RigAnything extends its application to effectively learn and represent skeletons, which are inherently tree structures. To achieve this, we organize the joints in a breadth-first search (BFS) order, enabling the skeleton to be defined as a sequence of 3D locations and the parent index. Furthermore, our model improves the accuracy of position prediction by leveraging diffusion modeling, ensuring precise and consistent placement of joints within the hierarchy. This formulation allows the autoregressive model to efficiently capture both spatial and hierarchical relationships within the skeleton. Trained end-to-end on both RigNet and Objaverse datasets, RigAnything demonstrates state-of-the-art performance across diverse object types, including humanoids, quadrupeds, marine creatures, insects, and many more, surpassing prior methods in quality, robustness, generalizability, and efficiency. It achieves significantly faster performance than existing auto-rigging methods, completing rigging in under a few seconds per shape. Please check our website for more details: https://www.liuisabella.com/RigAnything

RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets

TL;DR

RigAnything tackles automatic rigging for arbitrary 3D assets without predefined templates by modeling skeletons as BFS-ordered sequences of 3D joints with parent indices and learning skinning weights in a unified autoregressive transformer framework. It combines a diffusion-based joint sampler with a hybrid-attention transformer that processes both global shape context and evolving skeleton structure, and predicts connectivity and skinning weights in tandem. Trained end-to-end on RigNet and a filtered Objaverse subset, RigAnything achieves state-of-the-art results across humanoids, quadrupeds, marine life, insects, and other categories, while delivering rigging in under a few seconds per shape. The approach enhances generalizability, robustness, and efficiency for auto-rigging, enabling scalable pipelines for interactive 3D content creation.

Abstract

We present RigAnything, a novel autoregressive transformer-based model, which makes 3D assets rig-ready by probabilistically generating joints and skeleton topologies and assigning skinning weights in a template-free manner. Unlike most existing auto-rigging methods, which rely on predefined skeleton templates and are limited to specific categories like humanoid, RigAnything approaches the rigging problem in an autoregressive manner, iteratively predicting the next joint based on the global input shape and the previous prediction. While autoregressive models are typically used to generate sequential data, RigAnything extends its application to effectively learn and represent skeletons, which are inherently tree structures. To achieve this, we organize the joints in a breadth-first search (BFS) order, enabling the skeleton to be defined as a sequence of 3D locations and the parent index. Furthermore, our model improves the accuracy of position prediction by leveraging diffusion modeling, ensuring precise and consistent placement of joints within the hierarchy. This formulation allows the autoregressive model to efficiently capture both spatial and hierarchical relationships within the skeleton. Trained end-to-end on both RigNet and Objaverse datasets, RigAnything demonstrates state-of-the-art performance across diverse object types, including humanoids, quadrupeds, marine creatures, insects, and many more, surpassing prior methods in quality, robustness, generalizability, and efficiency. It achieves significantly faster performance than existing auto-rigging methods, completing rigging in under a few seconds per shape. Please check our website for more details: https://www.liuisabella.com/RigAnything

Paper Structure

This paper contains 25 sections, 16 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Skeleton generation given real images, showing our method generalizes well to real data. More real results in \ref{['fig:real_capture']}.
  • Figure 2: A single step in our method: The input shape and the previously predicted skeleton sequence are tokenized using two separate tokenizers. These tokens are processed through a chain of autoregressive transformer blocks with a hybrid attention mask. Shape tokens perform self-attention to capture global geometric information, while skeleton tokens attend to all shape tokens and use causal attention within themselves to maintain the autoregressive generation process. After the transformer blocks, a skinning module decodes shape tokens into skinning weights, a joint diffusion module samples the next joint position, and a connectivity module predicts the next joint's connection to its preceding joints.
  • Figure 3: Illustration of sibling ambiguity during BFS ordering in skeletons.
  • Figure 4: Examples of different valid skeleton topologies for the same shape.
  • Figure 5: (Left) Hybrid attention mask: Shape tokens use full self-attention, while skeleton tokens attend to shape tokens and apply causal masking among themselves. (Right) The skeleton sequence is autoregressively generated during inference.
  • ...and 6 more figures