Table of Contents
Fetching ...

ASMR: Adaptive Skeleton-Mesh Rigging and Skinning via 2D Generative Prior

Seokhyeon Hong, Soojin Choi, Chaelin Kim, Sihun Cha, Junyong Noh

TL;DR

This work tackles automatic rigging and skinning of stylized character meshes using skeletal motion data across arbitrary mesh and skeleton configurations. It introduces a two-stage framework with Skeletal Articulation Prediction and Skinning Weight Prediction, augmented by Diffusion 3D Features (Diff3F) as a semantic prior and self-supervised learning via vertex reconstruction. The approach aligns a target skeleton to the mesh through offset residu- als and cross-attention, while inferring implicit per-vertex skinning weights with an attention mechanism, enabling robust Linear Blend Skinning (LBS) across unseen character configurations. Empirical results on Mixamo and LaFAN1 show improved rigging and deformation quality over baselines (Pinocchio, RigNet, NBS), highlighting strong generalization enabled by Diff3F and skeleton-aware learning.

Abstract

Despite the growing accessibility of skeletal motion data, integrating it for animating character meshes remains challenging due to diverse configurations of both skeletons and meshes. Specifically, the body scale and bone lengths of the skeleton should be adjusted in accordance with the size and proportions of the mesh, ensuring that all joints are accurately positioned within the character mesh. Furthermore, defining skinning weights is complicated by variations in skeletal configurations, such as the number of joints and their hierarchy, as well as differences in mesh configurations, including their connectivity and shapes. While existing approaches have made efforts to automate this process, they hardly address the variations in both skeletal and mesh configurations. In this paper, we present a novel method for the automatic rigging and skinning of character meshes using skeletal motion data, accommodating arbitrary configurations of both meshes and skeletons. The proposed method predicts the optimal skeleton aligned with the size and proportion of the mesh as well as defines skinning weights for various mesh-skeleton configurations, without requiring explicit supervision tailored to each of them. By incorporating Diffusion 3D Features (Diff3F) as semantic descriptors of character meshes, our method achieves robust generalization across different configurations. To assess the performance of our method in comparison to existing approaches, we conducted comprehensive evaluations encompassing both quantitative and qualitative analyses, specifically examining the predicted skeletons, skinning weights, and deformation quality.

ASMR: Adaptive Skeleton-Mesh Rigging and Skinning via 2D Generative Prior

TL;DR

This work tackles automatic rigging and skinning of stylized character meshes using skeletal motion data across arbitrary mesh and skeleton configurations. It introduces a two-stage framework with Skeletal Articulation Prediction and Skinning Weight Prediction, augmented by Diffusion 3D Features (Diff3F) as a semantic prior and self-supervised learning via vertex reconstruction. The approach aligns a target skeleton to the mesh through offset residu- als and cross-attention, while inferring implicit per-vertex skinning weights with an attention mechanism, enabling robust Linear Blend Skinning (LBS) across unseen character configurations. Empirical results on Mixamo and LaFAN1 show improved rigging and deformation quality over baselines (Pinocchio, RigNet, NBS), highlighting strong generalization enabled by Diff3F and skeleton-aware learning.

Abstract

Despite the growing accessibility of skeletal motion data, integrating it for animating character meshes remains challenging due to diverse configurations of both skeletons and meshes. Specifically, the body scale and bone lengths of the skeleton should be adjusted in accordance with the size and proportions of the mesh, ensuring that all joints are accurately positioned within the character mesh. Furthermore, defining skinning weights is complicated by variations in skeletal configurations, such as the number of joints and their hierarchy, as well as differences in mesh configurations, including their connectivity and shapes. While existing approaches have made efforts to automate this process, they hardly address the variations in both skeletal and mesh configurations. In this paper, we present a novel method for the automatic rigging and skinning of character meshes using skeletal motion data, accommodating arbitrary configurations of both meshes and skeletons. The proposed method predicts the optimal skeleton aligned with the size and proportion of the mesh as well as defines skinning weights for various mesh-skeleton configurations, without requiring explicit supervision tailored to each of them. By incorporating Diffusion 3D Features (Diff3F) as semantic descriptors of character meshes, our method achieves robust generalization across different configurations. To assess the performance of our method in comparison to existing approaches, we conducted comprehensive evaluations encompassing both quantitative and qualitative analyses, specifically examining the predicted skeletons, skinning weights, and deformation quality.

Paper Structure

This paper contains 24 sections, 18 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: Comparison of the robustness of different auto-rigging and skinning approaches to variations in input configurations. While each approach has limited robustness or is not applicable in at least one component, the proposed method achieves robustness across all components.
  • Figure 2: Overview of our method. Given a source mesh to be animated and a source skeletal motion to derive its movement, our method predicts the target skeleton and the corresponding skinning weights that generate plausible deformation of the character mesh in accordance with the source skeletal movement. To accommodate source skeletons with arbitrary structures, we leverage an off-the-shelf retargeting module that aligns the target skeleton to the source pose, generating the target pose. Finally, the target pose, combined with the predicted skinning weights, is used to deform the character mesh. While our method does not rely on textural information, textures on the character meshes are included only to illustrate different poses.
  • Figure 3: Visualizations of the vertex correspondences between characters using Diff3F, where corresponding points are similarly colored. The source character is shown on the left, and the target characters are on the right.
  • Figure 4: Comparison to baselines on skeleton prediction results given the same mesh with different source skeletons. Each skeleton has distinct body scales and bone lengths, with varying numbers of joints: from the top 25, 24, and 23 joints.
  • Figure 5: Skinning weight results predicted from a source skeleton with a fixed number of joints.
  • ...and 14 more figures