Table of Contents
Fetching ...

FastAnimate: Towards Learnable Template Construction and Pose Deformation for Fast 3D Human Avatar Animation

Jian Shu, Nanjie Yao, Gangjian Zhang, Junlong Ren, Yu Feng, Hao Wang

TL;DR

FastAnimate targets real-time, high-fidelity 3D human avatar animation by unifying canonical template construction and target pose deformation into a two-stage, learnable framework. It decouples texture and geometry through a UV/pose-aware template representation that yields adaptive 3D Gaussians, then refines geometry via a learnable module trained with multi-view geometry supervision. The approach achieves state-of-the-art pose accuracy and texture quality while delivering real-time performance (≈0.1s per instance) on challenging datasets like X-Humans. This work advances practical, scalable avatar animation by mitigating LBS artifacts and enabling efficient, pose-conditioned rendering with high-detail textures.

Abstract

3D human avatar animation aims at transforming a human avatar from an arbitrary initial pose to a specified target pose using deformation algorithms. Existing approaches typically divide this task into two stages: canonical template construction and target pose deformation. However, current template construction methods demand extensive skeletal rigging and often produce artifacts for specific poses. Moreover, target pose deformation suffers from structural distortions caused by Linear Blend Skinning (LBS), which significantly undermines animation realism. To address these problems, we propose a unified learning-based framework to address both challenges in two phases. For the former phase, to overcome the inefficiencies and artifacts during template construction, we leverage a U-Net architecture that decouples texture and pose information in a feed-forward process, enabling fast generation of a human template. For the latter phase, we propose a data-driven refinement technique that enhances structural integrity. Extensive experiments show that our model delivers consistent performance across diverse poses with an optimal balance between efficiency and quality,surpassing state-of-the-art (SOTA) methods.

FastAnimate: Towards Learnable Template Construction and Pose Deformation for Fast 3D Human Avatar Animation

TL;DR

FastAnimate targets real-time, high-fidelity 3D human avatar animation by unifying canonical template construction and target pose deformation into a two-stage, learnable framework. It decouples texture and geometry through a UV/pose-aware template representation that yields adaptive 3D Gaussians, then refines geometry via a learnable module trained with multi-view geometry supervision. The approach achieves state-of-the-art pose accuracy and texture quality while delivering real-time performance (≈0.1s per instance) on challenging datasets like X-Humans. This work advances practical, scalable avatar animation by mitigating LBS artifacts and enabling efficient, pose-conditioned rendering with high-detail textures.

Abstract

3D human avatar animation aims at transforming a human avatar from an arbitrary initial pose to a specified target pose using deformation algorithms. Existing approaches typically divide this task into two stages: canonical template construction and target pose deformation. However, current template construction methods demand extensive skeletal rigging and often produce artifacts for specific poses. Moreover, target pose deformation suffers from structural distortions caused by Linear Blend Skinning (LBS), which significantly undermines animation realism. To address these problems, we propose a unified learning-based framework to address both challenges in two phases. For the former phase, to overcome the inefficiencies and artifacts during template construction, we leverage a U-Net architecture that decouples texture and pose information in a feed-forward process, enabling fast generation of a human template. For the latter phase, we propose a data-driven refinement technique that enhances structural integrity. Extensive experiments show that our model delivers consistent performance across diverse poses with an optimal balance between efficiency and quality,surpassing state-of-the-art (SOTA) methods.

Paper Structure

This paper contains 13 sections, 12 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Visualization of the stretch-geometric problem under deformation (Left) and our result (Right).
  • Figure 2: Overview of the proposed FastAnimate. This framework consists of two stages: In the first stage, we decouple the UV feature and pose feature from canonical mesh to build canonical Gaussians with the given human scans and SMPL-X canonical template. In the second stage, we utilize the LBS to drive canonical Gaussians to form coarse animated Gaussians. To further improve the animation quality, we utilize a coarse geometry refinement module to obtain high-fidelity refined animated human Gaussians. By leveraging FastAnimate, we can achieve robust 3D human animation results with enhanced texture quality and pose correctness.
  • Figure 3: Qualitative comparison with state-of-the-art methods on novel pose synthesis. Please note that due to the difference in camera parameters, the results of LIFE-GOM has a marginal angle difference with others. Please zoom in for a detailed view.
  • Figure 4: Visual ablation of proposed modules. Proposed modules improve fine-grained texture and geometry quality.