WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

Zilong Wang; Zhiyang Dou; Yuan Liu; Cheng Lin; Xiao Dong; Yunhui Guo; Chenxu Zhang; Xin Li; Wenping Wang; Xiaohu Guo

WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo

TL;DR

WonderHuman tackles dynamic human reconstruction from monocular video by hallucinating unseen parts with diffusion-model priors. It combines 3D Gaussian Splatting with Score Distillation Sampling applied in both canonical and observation spaces (Dual-space Optimization), guided by a view-selection strategy and pose-feature injection to maintain pose-consistent fidelity. A Stage I module reconstructs visible appearance, while Stage II uses SDS-based diffusion priors to infer unseen regions, supervised by normal maps and reinforced by visibility-aware refinement; a progressive training schedule balances canonical and observed-space learning. The method achieves state-of-the-art results on unseen parts across multiple benchmarks, delivers competitive rendering speed, and demonstrates robust occlusion handling, though it remains challenged by extreme occlusion and loose garments.

Abstract

In this paper, we present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Previous dynamic human avatar reconstruction methods typically require the input video to have full coverage of the observed human body. However, in daily practice, one typically has access to limited viewpoints, such as monocular front-view videos, making it a cumbersome task for previous methods to reconstruct the unseen parts of the human avatar. To tackle the issue, we present WonderHuman, which leverages 2D generative diffusion model priors to achieve high-quality, photorealistic reconstructions of dynamic human avatars from monocular videos, including accurate rendering of unseen body parts. Our approach introduces a Dual-Space Optimization technique, applying Score Distillation Sampling (SDS) in both canonical and observation spaces to ensure visual consistency and enhance realism in dynamic human reconstruction. Additionally, we present a View Selection strategy and Pose Feature Injection to enforce the consistency between SDS predictions and observed data, ensuring pose-dependent effects and higher fidelity in the reconstructed avatar. In the experiments, our method achieves SOTA performance in producing photorealistic renderings from the given monocular video, particularly for those challenging unseen parts. The project page and source code can be found at https://wyiguanw.github.io/WonderHuman/.

WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

TL;DR

Abstract

WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)