Table of Contents
Fetching ...

Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation

Zhiyang Guo, Ori Zhang, Jax Xiang, Alan Zhao, Wengang Zhou, Houqiang Li

TL;DR

The paper tackles robust 3D humanoid posing by re conceiving posing as a latent-space transformation rather than geometry-space manipulation, addressing skinning artifacts and topology limitations of traditional rigging and pose-conditioned generative methods. It introduces a dense skeleton-conditioned latent space, a latent posing transformer, and a latent-space supervision mechanism, complemented by an adaptive completion module for newly exposed geometry. The approach achieves superior pose fidelity, faster inference, and enables editing tasks such as part replacement and refinement. This latent-space framework paves the way for efficient, scalable animation and editing of 3D characters with reduced dependence on manual rigging and topology constraints.

Abstract

Posing 3D characters is a fundamental task in computer graphics and vision. However, existing methods like auto-rigging and pose-conditioned generation often struggle with challenges such as inaccurate skinning weight prediction, topological imperfections, and poor pose conformance, limiting their robustness and generalizability. To overcome these limitations, we introduce Make-It-Poseable, a novel feed-forward framework that reformulates character posing as a latent-space transformation problem. Instead of deforming mesh vertices as in traditional pipelines, our method reconstructs the character in new poses by directly manipulating its latent representation. At the core of our method is a latent posing transformer that manipulates shape tokens based on skeletal motion. This process is facilitated by a dense pose representation for precise control. To ensure high-fidelity geometry and accommodate topological changes, we also introduce a latent-space supervision strategy and an adaptive completion module. Our method demonstrates superior performance in posing quality. It also naturally extends to 3D editing applications like part replacement and refinement.

Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation

TL;DR

The paper tackles robust 3D humanoid posing by re conceiving posing as a latent-space transformation rather than geometry-space manipulation, addressing skinning artifacts and topology limitations of traditional rigging and pose-conditioned generative methods. It introduces a dense skeleton-conditioned latent space, a latent posing transformer, and a latent-space supervision mechanism, complemented by an adaptive completion module for newly exposed geometry. The approach achieves superior pose fidelity, faster inference, and enables editing tasks such as part replacement and refinement. This latent-space framework paves the way for efficient, scalable animation and editing of 3D characters with reduced dependence on manual rigging and topology constraints.

Abstract

Posing 3D characters is a fundamental task in computer graphics and vision. However, existing methods like auto-rigging and pose-conditioned generation often struggle with challenges such as inaccurate skinning weight prediction, topological imperfections, and poor pose conformance, limiting their robustness and generalizability. To overcome these limitations, we introduce Make-It-Poseable, a novel feed-forward framework that reformulates character posing as a latent-space transformation problem. Instead of deforming mesh vertices as in traditional pipelines, our method reconstructs the character in new poses by directly manipulating its latent representation. At the core of our method is a latent posing transformer that manipulates shape tokens based on skeletal motion. This process is facilitated by a dense pose representation for precise control. To ensure high-fidelity geometry and accommodate topological changes, we also introduce a latent-space supervision strategy and an adaptive completion module. Our method demonstrates superior performance in posing quality. It also naturally extends to 3D editing applications like part replacement and refinement.

Paper Structure

This paper contains 34 sections, 3 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Given a 3D humanoid model of arbitrary shape and initial pose, our method efficiently re-poses it in a single feed-forward pass. Unlike auto-rigging or generative approaches that often suffer from skinning artifacts or limited controllability, our latent posing paradigm robustly handles challenging cases and produces high-fidelity animation results.
  • Figure 2: Pipeline of our character posing framework. Given a source shape and source/target skeletons, we encode them into latent representations with dense correspondence. A latent posing transformer then predicts the target shape tokens, which are finally decoded into the posed mesh. This framework is trained in two stages. First, a latent loss is established to preserve geometric details. Second, an adaptive completion module is exclusively finetuned with an SDF loss to synthesize plausible geometry for newly exposed structures.
  • Figure 3: Illustration of our key designs. (a) The skeleton encoder (\ref{['sec:skeleton']}) produces dense pose representations with latent-level one-to-one correspondence. (b) Latent-space supervision (\ref{['sec:training']}) ensures a semantically meaningful token transformation path to preserve geometric details. (c) Adaptive tokens (\ref{['sec:adaptive_tokens']}) are introduced in the finetuning stage to handle newly exposed structures after deformation.
  • Figure 4: Qualitative comparison on diverse characters and poses. We showcase results for re-posing each character into a widely-adopted T-pose and an additional random pose. Our method produces high-fidelity results across various cases. It robustly handles challenging inputs where MIA MIA and Puppeteer song2025puppeteer produce significant artifacts, and gives better pose conformance and detail preservation compared to HY3D-Omni hunyuan3domni. Note that HY3D-Omni takes the rendered image as input, not the 3D shape itself.
  • Figure 5: Our method enables various applications, e.g., (a) animation, (b) part segmentation, (c) replacement, and (d) refinement.
  • ...and 4 more figures