HINT: Learning Complete Human Neural Representations from Limited Viewpoints
Alessandro Sanvito, Andrea Ramazzina, Stefanie Walz, Mario Bijelic, Felix Heide
TL;DR
HINT addresses the challenge of reconstructing complete human avatars from limited viewpoints by splitting the scene into a background NeRF and a canonical-space, SDF-based human model, guided by a sagittal-plane symmetry prior and supervised by depth and segmentation cues. A co-trained Human Digitization Network (HDN) provides priors for unseen views, with targeted losses (including a novel SDF-based supervision) that prevent geometry collapse and promote realistic textures. Quantitative results show substantial gains over prior methods, with PSNR improvements of around $15 ext{%}$ and LPIPS reductions of about $34 ext{%}$, demonstrating robust novel-view synthesis from sparse data. The approach enables complete, animatable human representations in real-world, limited-view robotics scenarios, facilitating data augmentation, counterfactual generation, and safer autonomous operation in dynamic environments.
Abstract
No augmented application is possible without animated humanoid avatars. At the same time, generating human replicas from real-world monocular hand-held or robotic sensor setups is challenging due to the limited availability of views. Previous work showed the feasibility of virtual avatars but required the presence of 360 degree views of the targeted subject. To address this issue, we propose HINT, a NeRF-based algorithm able to learn a detailed and complete human model from limited viewing angles. We achieve this by introducing a symmetry prior, regularization constraints, and training cues from large human datasets. In particular, we introduce a sagittal plane symmetry prior to the appearance of the human, directly supervise the density function of the human model using explicit 3D body modeling, and leverage a co-learned human digitization network as additional supervision for the unseen angles. As a result, our method can reconstruct complete humans even from a few viewing angles, increasing performance by more than 15% PSNR compared to previous state-of-the-art algorithms.
