Table of Contents
Fetching ...

PGAHum: Prior-Guided Geometry and Appearance Learning for High-Fidelity Animatable Human Reconstruction

Hao Wang, Qingshan Xu, Hongyuan Chen, Rui Ma

TL;DR

PGAHum tackles the challenge of high-fidelity animatable human reconstruction from sparse video by embedding strong 3D priors into three complementary modules. It learns a prior-based implicit geometry with a base SDF from SMPL plus a delta SDF predicted by a tri-plane network, constrained by a prior-guided sampling strategy that focuses queries near the body surface. An iterative backward deformation scheme, guided by a learned skinning predictor, progressively maps observation-space points to a canonical space for robust optimization and rendering. The approach achieves finer geometric details and more photorealistic novel-view synthesis across unseen poses, demonstrated on multiple datasets, with ablations confirming the effectiveness of each module. These results enable high-quality, animatable human avatars from sparsely captured data, with potential impact on AR/VR, virtual try-on, and digital humans in interactive applications.

Abstract

Recent techniques on implicit geometry representation learning and neural rendering have shown promising results for 3D clothed human reconstruction from sparse video inputs. However, it is still challenging to reconstruct detailed surface geometry and even more difficult to synthesize photorealistic novel views with animated human poses. In this work, we introduce PGAHum, a prior-guided geometry and appearance learning framework for high-fidelity animatable human reconstruction. We thoroughly exploit 3D human priors in three key modules of PGAHum to achieve high-quality geometry reconstruction with intricate details and photorealistic view synthesis on unseen poses. First, a prior-based implicit geometry representation of 3D human, which contains a delta SDF predicted by a tri-plane network and a base SDF derived from the prior SMPL model, is proposed to model the surface details and the body shape in a disentangled manner. Second, we introduce a novel prior-guided sampling strategy that fully leverages the prior information of the human pose and body to sample the query points within or near the body surface. By avoiding unnecessary learning in the empty 3D space, the neural rendering can recover more appearance details. Last, we propose a novel iterative backward deformation strategy to progressively find the correspondence for the query point in observation space. A skinning weights prediction model is learned based on the prior provided by the SMPL model to achieve the iterative backward LBS deformation. Extensive quantitative and qualitative comparisons on various datasets are conducted and the results demonstrate the superiority of our framework. Ablation studies also verify the effectiveness of each scheme for geometry and appearance learning.

PGAHum: Prior-Guided Geometry and Appearance Learning for High-Fidelity Animatable Human Reconstruction

TL;DR

PGAHum tackles the challenge of high-fidelity animatable human reconstruction from sparse video by embedding strong 3D priors into three complementary modules. It learns a prior-based implicit geometry with a base SDF from SMPL plus a delta SDF predicted by a tri-plane network, constrained by a prior-guided sampling strategy that focuses queries near the body surface. An iterative backward deformation scheme, guided by a learned skinning predictor, progressively maps observation-space points to a canonical space for robust optimization and rendering. The approach achieves finer geometric details and more photorealistic novel-view synthesis across unseen poses, demonstrated on multiple datasets, with ablations confirming the effectiveness of each module. These results enable high-quality, animatable human avatars from sparsely captured data, with potential impact on AR/VR, virtual try-on, and digital humans in interactive applications.

Abstract

Recent techniques on implicit geometry representation learning and neural rendering have shown promising results for 3D clothed human reconstruction from sparse video inputs. However, it is still challenging to reconstruct detailed surface geometry and even more difficult to synthesize photorealistic novel views with animated human poses. In this work, we introduce PGAHum, a prior-guided geometry and appearance learning framework for high-fidelity animatable human reconstruction. We thoroughly exploit 3D human priors in three key modules of PGAHum to achieve high-quality geometry reconstruction with intricate details and photorealistic view synthesis on unseen poses. First, a prior-based implicit geometry representation of 3D human, which contains a delta SDF predicted by a tri-plane network and a base SDF derived from the prior SMPL model, is proposed to model the surface details and the body shape in a disentangled manner. Second, we introduce a novel prior-guided sampling strategy that fully leverages the prior information of the human pose and body to sample the query points within or near the body surface. By avoiding unnecessary learning in the empty 3D space, the neural rendering can recover more appearance details. Last, we propose a novel iterative backward deformation strategy to progressively find the correspondence for the query point in observation space. A skinning weights prediction model is learned based on the prior provided by the SMPL model to achieve the iterative backward LBS deformation. Extensive quantitative and qualitative comparisons on various datasets are conducted and the results demonstrate the superiority of our framework. Ablation studies also verify the effectiveness of each scheme for geometry and appearance learning.
Paper Structure (17 sections, 10 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 10 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Given sparse input videos, our PGAHum can reconstruct high-fidelity animatable avatar with fine-grained geometry and appearance details on various datasets, e.g., ZJU-Mocap peng2021neural (top), PeopleSnapshot alldieck2018detailed (middle) and MonoCap peng2024animatable (bottom).
  • Figure 2: Overview of our pipeline. For an input view from the multi-view video frames with estimated human pose, we first utilize prior-guided sampling to sample points inside and around the human body based on the ray-body intersection, where the SMPL is used a prior for the body model. For a sampled point $\textbf{x}_{obs}$, we deform it to the corresponding point $\textbf{x}_{cnl}$ in a canonical space through the iterative backward deformation. With the transformed points, we learn a prior-based implicit geometry representation which combines the prior SDF volume $\mathcal{S}_{base}$ derived from SMPL with $\mathcal{S}_{delta}$ predicted by a tri-plane network $F_{\phi_s}$ for modeling the human body with surface details in canonical space. In addition, a feature vector $\mu$ produces from $F_{\phi_s}$ as well as view direction $\textbf{v}$ is passed to the color branch $F_{\phi_c}$ to get color value. Finally, volume rendering is performed to render images, normal maps and subject mask for the loss computation.
  • Figure 3: Qualitative results on ZJU-MoCap dataset for novel view synthesis on training poses.
  • Figure 4: Qualitative results on ZJU-MoCap dataset for geometry reconstruction.
  • Figure 5: Qualitative results on SyntheticHuman++ dataset for geometry reconstruction.
  • ...and 8 more figures