Table of Contents
Fetching ...

LiDAR-HMR: 3D Human Mesh Recovery from LiDAR

Bohao Fan, Wenzhao Zheng, Jianjiang Feng, Jie Zhou

TL;DR

The paper tackles 3D human mesh recovery from sparse single-frame LiDAR point clouds, a challenging setting due to data sparsity, noise, and incompleteness. It introduces LiDAR-HMR, a sparse-to-dense pipeline comprising a Pose Regression Network (PRN) for template pose estimation, a Mesh Reconstruction Network (MRN) that progressively reconstructs a dense mesh via graphormer-based feature propagation, and MeshIK to recover SMPL pose and shape from the mesh. Key contributions include (1) a point-cloud–driven dense reconstruction approach that preserves local detail, (2) a resolution-consistent feature propagation mechanism across mesh scales, (3) a local detail-focused MeshIK module that aligns unconstrained meshes with SMPL parameters, and (4) state-of-the-art performance across four public datasets with favorable efficiency. The method demonstrates robust 3D HMR and HPE performance in outdoor, multimodal contexts and offers practical applicability for real-time or near-real-time systems with cluttered, outdoor LiDAR data, even under poor illumination. Future work may further integrate human pose priors or temporal constraints to address severely missing regions and enhance realism of reconstructed surfaces.

Abstract

In recent years, point cloud perception tasks have been garnering increasing attention. This paper presents the first attempt to estimate 3D human body mesh from sparse LiDAR point clouds. We found that the major challenge in estimating human pose and mesh from point clouds lies in the sparsity, noise, and incompletion of LiDAR point clouds. Facing these challenges, we propose an effective sparse-to-dense reconstruction scheme to reconstruct 3D human mesh. This involves estimating a sparse representation of a human (3D human pose) and gradually reconstructing the body mesh. To better leverage the 3D structural information of point clouds, we employ a cascaded graph transformer (graphormer) to introduce point cloud features during sparse-to-dense reconstruction. Experimental results on three publicly available databases demonstrate the effectiveness of the proposed approach. Code: https://github.com/soullessrobot/LiDAR-HMR/

LiDAR-HMR: 3D Human Mesh Recovery from LiDAR

TL;DR

The paper tackles 3D human mesh recovery from sparse single-frame LiDAR point clouds, a challenging setting due to data sparsity, noise, and incompleteness. It introduces LiDAR-HMR, a sparse-to-dense pipeline comprising a Pose Regression Network (PRN) for template pose estimation, a Mesh Reconstruction Network (MRN) that progressively reconstructs a dense mesh via graphormer-based feature propagation, and MeshIK to recover SMPL pose and shape from the mesh. Key contributions include (1) a point-cloud–driven dense reconstruction approach that preserves local detail, (2) a resolution-consistent feature propagation mechanism across mesh scales, (3) a local detail-focused MeshIK module that aligns unconstrained meshes with SMPL parameters, and (4) state-of-the-art performance across four public datasets with favorable efficiency. The method demonstrates robust 3D HMR and HPE performance in outdoor, multimodal contexts and offers practical applicability for real-time or near-real-time systems with cluttered, outdoor LiDAR data, even under poor illumination. Future work may further integrate human pose priors or temporal constraints to address severely missing regions and enhance realism of reconstructed surfaces.

Abstract

In recent years, point cloud perception tasks have been garnering increasing attention. This paper presents the first attempt to estimate 3D human body mesh from sparse LiDAR point clouds. We found that the major challenge in estimating human pose and mesh from point clouds lies in the sparsity, noise, and incompletion of LiDAR point clouds. Facing these challenges, we propose an effective sparse-to-dense reconstruction scheme to reconstruct 3D human mesh. This involves estimating a sparse representation of a human (3D human pose) and gradually reconstructing the body mesh. To better leverage the 3D structural information of point clouds, we employ a cascaded graph transformer (graphormer) to introduce point cloud features during sparse-to-dense reconstruction. Experimental results on three publicly available databases demonstrate the effectiveness of the proposed approach. Code: https://github.com/soullessrobot/LiDAR-HMR/
Paper Structure (15 sections, 28 equations, 11 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 28 equations, 11 figures, 4 tables, 1 algorithm.

Figures (11)

  • Figure 1: Three challenges in 3D human mesh reconstruction from single frame sparse LiDAR point cloud. (Point clouds together with corresponding ground-truth meshes (front view and side view).)
  • Figure 2: Top: Most 3D HMR methods from point cloud inputs utilize the sparse global feature extracted from the initial point cloud, which will cause information loss in local details. Down: The proposed point-cloud-to-SMPL pipeline for 3D HMR. Point cloud features are utilized to guide the coarse-to-fine process of human meshes to obtain better local details, and point features from coarse meshes are propagated rather than upsampled to obtain features in the fine meshes. Finally, a differentiable module, named MeshIK, can obtain SMPL parameters from fine meshes.
  • Figure 3: Example results of LiDAR-HMR on multiperson scenes in the Waymo sun2020scalability dataset. RGB images are not utilized in the algorithm but they are illustrated for better visualization. LiDAR-HMR can reconstruct accurate human meshes under different illumination conditions.
  • Figure 4: The overall structure of the proposed modules. (a) For the pose regression network (PRN), the input point clouds are encoded by PointTransformer-v2 and decoded into $q$ and $\mu$ to obtain a predicted human pose. The predicted pose is subsequently fed into two self-attention layers for refinement and completion. The red arrows indicate the loss functions for $q$ and $\mu$. The shape of the intermediate features is denoted as $(x,y)$. Specifically, $N$ denotes the number of input points, $J$ denotes the number of key points, and $D$ denotes the fixed feature dimension for attention. (b) The MRN receives the point cloud features, which estimates a template pose from the PRN and gradually reconstructs the complete human mesh. We utilize a point cloud-based graphormer lin2021mesh for each intermediate resolution to introduce point cloud features during the reconstruction. Vertex features are inherited with a propagation module to model the parent-children relationship during the coarsening process. Finally, a fully connected layer is utilized to obtain the fine human mesh. The shape of the intermediate features is denoted as $(x,y)$. Specifically, $N$ denotes the number of input points, $V$ denotes the number of vertices, and $D$ denotes the fixed feature dimension for attention.
  • Figure 5: The mesh coarsening gradually generates multiple coarse graphs with heavy edge matching (HEM) following choi2020pose2mesh. Specifically, we perform reverse reconstruction with the MRN, which utilizes the parent-children relationship during coarsening. The vertex features are propagated following the parent-children edge to generate a higher-resolution mesh.
  • ...and 6 more figures