Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images

Yingzhi Tang; Qijian Zhang; Junhui Hou; Yebin Liu

Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images

Yingzhi Tang, Qijian Zhang, Junhui Hou, Yebin Liu

TL;DR

HaP tackles single-view 3D human reconstruction by replacing implicit representations with an explicit, point-based pipeline. It jointly leverages depth maps and rectified SMPL priors, using a conditional diffusion model to generate a complete human point cloud in 3D space, followed by refinement and mesh extraction. The approach introduces an SMPL rectification module and a diffusion-based 3D generator conditioned on depth and SMPL cues, plus a new CityUHuman dataset with detailed scans. Empirical results show 20–40% improvements over state-of-the-art implicit methods and competitive performance against advanced explicit/hybrid techniques, underscoring the practical value of explicit, geometry-centric design for robust, richly detailed 3D human reconstruction.

Abstract

The latest trends in the research field of single-view human reconstruction devote to learning deep implicit functions constrained by explicit body shape priors. Despite the remarkable performance improvements compared with traditional processing pipelines, existing learning approaches still show different aspects of limitations in terms of flexibility, generalizability, robustness, and/or representation capability. To comprehensively address the above issues, in this paper, we investigate an explicit point-based human reconstruction framework called HaP, which adopts point clouds as the intermediate representation of the target geometric structure. Technically, our approach is featured by fully-explicit point cloud estimation, manipulation, generation, and refinement in the 3D geometric space, instead of an implicit learning process that can be ambiguous and less controllable. The overall workflow is carefully organized with dedicated designs of the corresponding specialized learning components as well as processing procedures. Extensive experiments demonstrate that our framework achieves quantitative performance improvements of 20% to 40% over current state-of-the-art methods, and better qualitative results. Our promising results may indicate a paradigm rollback to the fully-explicit and geometry-centric algorithm design, which enables to exploit various powerful point cloud modeling architectures and processing techniques. We will make our code and data publicly available at https://github.com/yztang4/HaP.

Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images

TL;DR

Abstract

Paper Structure (33 sections, 5 equations, 20 figures, 6 tables)

This paper contains 33 sections, 5 equations, 20 figures, 6 tables.

Introduction
Related Work
Monocular Depth Estimation
SMPL Estimation and Rectification
Point Cloud Representation
Clothed Human Reconstruction
3D Generative Models
Proposed Method
Overview
Estimating 3D Information from Single 2D Images
Depth estimation
SMPL Estimation and Rectification
Diffusion-based Explicit Generation of Human Body
Diffusion Stage
Refinement Stage
...and 18 more sections

Figures (20)

Figure 1: Visual comparisons of reconstructed human bodies and distance error maps by different methods. (a) PIFu saito2019pifu, (b) ICON xiu2022icon, (c) IntegratedPIFu chan2022integratedpifu, (d) ECON xiu2022econ, (e) Proposed HaP. Our method can reconstruct clothing details and poses better than existing methods. Zoom in for detailed geometry.
Figure 2: Illustration of the samples with texture and geometry details in our proposed dataset named CityUHuman. Zoom in for detailed geometry. We also refer the readers to the cityuhumanvideodemo.mp4 in the supplementary material.
Figure 3: The pipeline of our framework HaP. HaP first estimates a depth map and an SMPL model from the RGB input, which serve as conditions of a diffusion process to generate sparse human point clouds. At the refinement stage, we further propose a refinement network $\texttt{PNet}_{\Theta_2}(\cdot)$ and a depth replacement operation to enhance the quality of $\mathcal{H}_{\mathrm{coarse}}$, and finally reconstruct meshes from $\mathcal{H}_{\mathrm{final}}$ via screened Poisson kazhdan2013screened. PC: Point Cloud. FPS: Farthest Point Sampling qi2017pointnet++.
Figure 4: Illustration of the SMPL rectification process. (a) Initially estimated SMPL model. (b) Registration before rectification. (c) Rectification process. (d) Registration after rectification. The directly estimated SMPL model is not well registered with the partial point cloud, and the situation is significantly relieved after the rectification process.
Figure 5: Illustration of the point cloud generation process. (a) Diffusion process. (b) Refinement process.
...and 15 more figures

Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images

TL;DR

Abstract

Human as Points: Explicit Point-based 3D Human Reconstruction from Single-view RGB Images

Authors

TL;DR

Abstract

Table of Contents

Figures (20)