SPRig: Self-Supervised Pose-Invariant Rigging from Mesh Sequences

Ruipeng Wang; Langkun Zhong; Miaowei Wang

SPRig: Self-Supervised Pose-Invariant Rigging from Mesh Sequences

Ruipeng Wang, Langkun Zhong, Miaowei Wang

Abstract

State-of-the-art rigging methods assume a canonical rest pose--an assumption that fails for sequential data (e.g., animal motion capture or AIGC/video-derived mesh sequences) that lack the T-pose. Applied frame-by-frame, these methods are not pose-invariant and produce topological inconsistencies across frames. Thus We propose SPRig, a general fine-tuning framework that enforces cross-frame consistency losses to learn pose-invariant rigs on top of existing models. We validate our approach on rigging using a new permutation-invariant stability protocol. Experiments demonstrate SOTA temporal stability: our method produces coherent rigs from challenging sequences and dramatically reduces the artifacts that plague baseline methods. The code will be released publicly upon acceptance.

SPRig: Self-Supervised Pose-Invariant Rigging from Mesh Sequences

Abstract

Paper Structure (55 sections, 71 equations, 7 figures, 7 tables)

This paper contains 55 sections, 71 equations, 7 figures, 7 tables.

Introduction
Related Work
Skeleton Generation
Backbone
Token-space (Transformer) consistency
Geometry-space (permutation-invariant) consistency
Overall objective
Skinning Generation
Backbone
Anchor teacher via barycentric transfer
Loss: articulation-invariant consistency with geometric prior
Experiment
Skeleton Generation
Skinning Generation
Discussion
...and 40 more sections

Figures (7)

Figure 1: Our method (top, blue) yields a complete, temporally consistent skeleton with smooth, coherent skinning weights, whereas Puppeteer (bottom, red) song2025puppeteer produces an incomplete skeleton with missing hand rigging and unstable, blocky skinning.
Figure 2: Skeleton generation overview. Point clouds sampled from mesh sequences are fed to a Transformer-based skeleton generator. An anchor skeleton from the original generator defines a canonical target; token-space and geometry-space consistency losses fine-tune the model so that decoded tokens yield temporally consistent skeletons.
Figure 3: Skinning generation pipeline overview. A high-quality anchor teacher is first generated using a pretrained generator on the anchor frame. Pose-consistent query points are sampled from the full mesh sequence via barycentric interpolation. A skinning generator is then fine-tuned: its predictions on all query points are compared against the single anchor teacher using our articulation-invariant consistency loss, forcing the model to learn a pose-invariant mapping and produce temporally consistent skinning.
Figure 4: Qualitative comparison of skeleton predictions on non-humanoid (Bear, left) and humanoid (Human, right) meshes. Top row (blue): our method produces temporally stable and complete skeletons across frames. Bottom row (red): the Puppeteer baseline often misses or distorts structures (e.g., missing leg skeletons on the bear and missing hand rigging on the human), and exhibits inconsistent skeleton topology between frames (red boxes).
Figure 5: Qualitative comparison of temporal consistency (L1 Error Heatmap). We visualize the per-vertex $L1$ error ($L_1(\hat{\mathbf{W}}^{(k)}, \mathbf{W}^T)$) between the prediction on a perturbed frame and the static anchor teacher $\mathbf{W}^T$. The colormap ranges from blue (zero error) to red (high error). (Left - Puppeteer): The baseline model exhibits severe temporal inconsistency, with large high-error regions on the limbs. (Right - Ours): Our fine-tuning framework almost completely eliminates this inconsistency. The high-error regions are suppressed, and the surfaces remain dark blue, visually confirming the quantitative gains in Table \ref{['tab:skinning_temporal']}.
...and 2 more figures

SPRig: Self-Supervised Pose-Invariant Rigging from Mesh Sequences

Abstract

SPRig: Self-Supervised Pose-Invariant Rigging from Mesh Sequences

Authors

Abstract

Table of Contents

Figures (7)