Table of Contents
Fetching ...

HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset

Zedong Chu, Feng Xiong, Meiduo Liu, Jinzhi Zhang, Mingqi Shao, Zhaoxu Sun, Di Wang, Mu Xu

TL;DR

HumanRig addresses the lack of large-scale, standardized data for humanoid rigging by introducing 11,434 AI-generated T-pose models aligned to a uniform Mixamo skeleton and a data-driven automatic rigging pipeline. The framework combines a Prior-Guided Skeleton Estimator to initialize coarse 3D joints, a U-shaped Point Transformer to encode mesh features, and a Mesh-Skeleton Mutual Attention Network to jointly optimize skeleton construction and skinning. Key contributions include the PGSE, the MSCAN-based fusion of skeleton and mesh features, and the performance gains over state-of-the-art methods on both AI-generated and artist-created meshes. The work promises more efficient, automated rigging workflows and advances in animation pipelines, while noting limitations such as finer finger anatomy and potential extension to quadrupeds.

Abstract

With the rapid evolution of 3D generation algorithms, the cost of producing 3D humanoid character models has plummeted, yet the field is impeded by the lack of a comprehensive dataset for automatic rigging, which is a pivotal step in character animation. Addressing this gap, we present HumanRig, the first large-scale dataset specifically designed for 3D humanoid character rigging, encompassing 11,434 meticulously curated T-posed meshes adhered to a uniform skeleton topology. Capitalizing on this dataset, we introduce an innovative, data-driven automatic rigging framework, which overcomes the limitations of GNN-based methods in handling complex AI-generated meshes. Our approach integrates a Prior-Guided Skeleton Estimator (PGSE) module, which uses 2D skeleton joints to provide a preliminary 3D skeleton, and a Mesh-Skeleton Mutual Attention Network (MSMAN) that fuses skeleton features with 3D mesh features extracted by a U-shaped point transformer. This enables a coarse-to-fine 3D skeleton joint regression and a robust skinning estimation, surpassing previous methods in quality and versatility. This work not only remedies the dataset deficiency in rigging research but also propels the animation industry towards more efficient and automated character rigging pipelines.

HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset

TL;DR

HumanRig addresses the lack of large-scale, standardized data for humanoid rigging by introducing 11,434 AI-generated T-pose models aligned to a uniform Mixamo skeleton and a data-driven automatic rigging pipeline. The framework combines a Prior-Guided Skeleton Estimator to initialize coarse 3D joints, a U-shaped Point Transformer to encode mesh features, and a Mesh-Skeleton Mutual Attention Network to jointly optimize skeleton construction and skinning. Key contributions include the PGSE, the MSCAN-based fusion of skeleton and mesh features, and the performance gains over state-of-the-art methods on both AI-generated and artist-created meshes. The work promises more efficient, automated rigging workflows and advances in animation pipelines, while noting limitations such as finer finger anatomy and potential extension to quadrupeds.

Abstract

With the rapid evolution of 3D generation algorithms, the cost of producing 3D humanoid character models has plummeted, yet the field is impeded by the lack of a comprehensive dataset for automatic rigging, which is a pivotal step in character animation. Addressing this gap, we present HumanRig, the first large-scale dataset specifically designed for 3D humanoid character rigging, encompassing 11,434 meticulously curated T-posed meshes adhered to a uniform skeleton topology. Capitalizing on this dataset, we introduce an innovative, data-driven automatic rigging framework, which overcomes the limitations of GNN-based methods in handling complex AI-generated meshes. Our approach integrates a Prior-Guided Skeleton Estimator (PGSE) module, which uses 2D skeleton joints to provide a preliminary 3D skeleton, and a Mesh-Skeleton Mutual Attention Network (MSMAN) that fuses skeleton features with 3D mesh features extracted by a U-shaped point transformer. This enables a coarse-to-fine 3D skeleton joint regression and a robust skinning estimation, surpassing previous methods in quality and versatility. This work not only remedies the dataset deficiency in rigging research but also propels the animation industry towards more efficient and automated character rigging pipelines.

Paper Structure

This paper contains 19 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The AI-generated mesh and the artist-created one show distinct face topology distributions. Manual meshes often have varying vertex densities across body parts and special topologies near joints for better deformation, while AI meshes tend to have chaotic topologies which lack semantic information. Previous rigging methods can only handle simple artist-created meshes, while HumanRig deals well with both AI and manual ones, especially for those with complex clothes or accessories and irregular body shapes.
  • Figure 2: Data Acquisition Pipeline for our HumanRig dataset.
  • Figure 3: Head-to-body Ratio Diversity Statistics of HumanRig.
  • Figure 4: Method Overview. Given a humanoid mesh, a coarse skeleton is predicted by an Prior-guided skeleton estimator (PGSE) and helps construct the skeleton-aware vertex features. They are fed into Skeleton Encoder and Mesh Encoder, respectively, then fused by a Mesh-Skeleton Mutual Attention Network to predict a refined skeleton and skinning weights with a joint learning strategy. Finally, the skeleton and skinning weights are combined to produce the animation-ready character.
  • Figure 5: Study on the Importance of Diversity in Body Shapes.
  • ...and 2 more figures