Table of Contents
Fetching ...

Few-Shot Multi-Human Neural Rendering Using Geometry Constraints

Qian li, Victoria Fernàndez Abrevaya, Franck Multon, Adnane Boukhayma

TL;DR

This work tackles the problem of reconstructing the shape and radiance of multi-human scenes from sparse multi-view images. It introduces a geometry-guided pipeline that initializes an implicit surface with SMPL-based geometry, models each human via a union of bounding boxes, and refines appearance through a hybrid foreground-background rendering framework. The method incorporates an uncertainty-aware SDF loss, a ray-consistency loss, and a saturation loss to address sparsity and illumination variability, yielding state-of-the-art results on real CMU Panoptic data and synthetic MultiHuman data. The approach enables robust few-shot multi-human reconstruction and rendering, with practical benefits for editing and downstream analysis, while acknowledging limitations in modeling close interactions between people.

Abstract

We present a method for recovering the shape and radiance of a scene consisting of multiple people given solely a few images. Multi-human scenes are complex due to additional occlusion and clutter. For single-human settings, existing approaches using implicit neural representations have achieved impressive results that deliver accurate geometry and appearance. However, it remains challenging to extend these methods for estimating multiple humans from sparse views. We propose a neural implicit reconstruction method that addresses the inherent challenges of this task through the following contributions: First, we propose to use geometry constraints by exploiting pre-computed meshes using a human body model (SMPL). Specifically, we regularize the signed distances using the SMPL mesh and leverage bounding boxes for improved rendering. Second, we propose a ray regularization scheme to minimize rendering inconsistencies, and a saturation regularization for robust optimization in variable illumination. Extensive experiments on both real and synthetic datasets demonstrate the benefits of our approach and show state-of-the-art performance against existing neural reconstruction methods.

Few-Shot Multi-Human Neural Rendering Using Geometry Constraints

TL;DR

This work tackles the problem of reconstructing the shape and radiance of multi-human scenes from sparse multi-view images. It introduces a geometry-guided pipeline that initializes an implicit surface with SMPL-based geometry, models each human via a union of bounding boxes, and refines appearance through a hybrid foreground-background rendering framework. The method incorporates an uncertainty-aware SDF loss, a ray-consistency loss, and a saturation loss to address sparsity and illumination variability, yielding state-of-the-art results on real CMU Panoptic data and synthetic MultiHuman data. The approach enables robust few-shot multi-human reconstruction and rendering, with practical benefits for editing and downstream analysis, while acknowledging limitations in modeling close interactions between people.

Abstract

We present a method for recovering the shape and radiance of a scene consisting of multiple people given solely a few images. Multi-human scenes are complex due to additional occlusion and clutter. For single-human settings, existing approaches using implicit neural representations have achieved impressive results that deliver accurate geometry and appearance. However, it remains challenging to extend these methods for estimating multiple humans from sparse views. We propose a neural implicit reconstruction method that addresses the inherent challenges of this task through the following contributions: First, we propose to use geometry constraints by exploiting pre-computed meshes using a human body model (SMPL). Specifically, we regularize the signed distances using the SMPL mesh and leverage bounding boxes for improved rendering. Second, we propose a ray regularization scheme to minimize rendering inconsistencies, and a saturation regularization for robust optimization in variable illumination. Extensive experiments on both real and synthetic datasets demonstrate the benefits of our approach and show state-of-the-art performance against existing neural reconstruction methods.

Paper Structure

This paper contains 28 sections, 11 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Overview. We address the multi-human implicit shape and appearance learning problem by initializing the geometry using SMPL (Sec. \ref{['sec:method_geometric_init']}), along with uncertainty-based SDF supervision and novel photometric regularizations designed to compensate for the lack of views (Sec. \ref{['sec:method_regularizations']}). We also model the foreground (Union of SMLP bounding boxes) and remainder of the scene seperatelty (Sec. \ref{['sec:method_bboxes']}).
  • Figure 2: Qualitative comparison against NeuS wang2021neus and VolSDF yariv2021volume of synthesised novel views and reconstructed normal images of multiple humans on CMU Panoptic dataset Simon_2017_CVPRJoo_2017_TPAMI, using 5/10/15/20 training views.
  • Figure 3: Quantitative comparison of average PSNR (↑), SSIM (↑) and LPIPS (↓) with increased number of training views.
  • Figure 4: Comparison against wang2022arah from 5 training views. PSNRs for the 3 examples are respectively: 26.97/29.56, 27.48/33.66, 24.36/30.56 (ARAH/Ours).
  • Figure 5: Qualitative comparison of synthesised novel views and reconstructed normal images on the synthetic dataset (MultiHuman-Dataset zheng2021deepmulticap) with 10 and 15 training views respectively.
  • ...and 7 more figures