Table of Contents
Fetching ...

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

Yiming Ren, Xiao Han, Chengfeng Zhao, Jingya Wang, Lan Xu, Jingyi Yu, Yuexin Ma

TL;DR

LiveHPS addresses LiDAR-based scene-level 3D human pose and shape estimation in free environments by introducing a vertex-guided adaptive distillation framework, a consecutive pose optimizer that leverages temporal-spatial cues, and a skeleton-aware translation solver. The approach delivers full SMPL parameter estimation (pose, shape, translation) from single-LiDAR data without lighting or wearable constraints, achieving state-of-the-art results on FreeMotion and other datasets. A large-scale FreeMotion dataset with multi-view, multi-modal annotations supports robust learning and benchmarking, including privacy-conscious data handling. The work demonstrates strong generalization, occlusion robustness, and real-time performance, highlighting practical applicability for real-world robotics, AR/VR, and autonomous systems.

Abstract

For human-centric large-scale scenes, fine-grained modeling for 3D human global pose and shape is significant for scene understanding and can benefit many real-world applications. In this paper, we present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation without any limitation of light conditions and wearable devices. In particular, we design a distillation mechanism to mitigate the distribution-varying effect of LiDAR point clouds and exploit the temporal-spatial geometric and dynamic information existing in consecutive frames to solve the occlusion and noise disturbance. LiveHPS, with its efficient configuration and high-quality output, is well-suited for real-world applications. Moreover, we propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses, shapes and translations. It consists of multi-modal and multi-view acquisition data from calibrated and synchronized LiDARs, cameras, and IMUs. Extensive experiments on our new dataset and other public datasets demonstrate the SOTA performance and robustness of our approach. We will release our code and dataset soon.

LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment

TL;DR

LiveHPS addresses LiDAR-based scene-level 3D human pose and shape estimation in free environments by introducing a vertex-guided adaptive distillation framework, a consecutive pose optimizer that leverages temporal-spatial cues, and a skeleton-aware translation solver. The approach delivers full SMPL parameter estimation (pose, shape, translation) from single-LiDAR data without lighting or wearable constraints, achieving state-of-the-art results on FreeMotion and other datasets. A large-scale FreeMotion dataset with multi-view, multi-modal annotations supports robust learning and benchmarking, including privacy-conscious data handling. The work demonstrates strong generalization, occlusion robustness, and real-time performance, highlighting practical applicability for real-world robotics, AR/VR, and autonomous systems.

Abstract

For human-centric large-scale scenes, fine-grained modeling for 3D human global pose and shape is significant for scene understanding and can benefit many real-world applications. In this paper, we present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation without any limitation of light conditions and wearable devices. In particular, we design a distillation mechanism to mitigate the distribution-varying effect of LiDAR point clouds and exploit the temporal-spatial geometric and dynamic information existing in consecutive frames to solve the occlusion and noise disturbance. LiveHPS, with its efficient configuration and high-quality output, is well-suited for real-world applications. Moreover, we propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses, shapes and translations. It consists of multi-modal and multi-view acquisition data from calibrated and synchronized LiDARs, cameras, and IMUs. Extensive experiments on our new dataset and other public datasets demonstrate the SOTA performance and robustness of our approach. We will release our code and dataset soon.
Paper Structure (31 sections, 11 equations, 15 figures, 6 tables)

This paper contains 31 sections, 11 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 2: The pipeline of LiveHPS. With sequential LiDAR point clouds as input, LiveHPS consists of three critical modules to obtain human SMPL parameters, including a point-based body tracker to distill the pose-prior information, a consecutive pose optimizer to refine the pose via utilizing joint-wise features, and a multi-head SMPL solver to regress parameters of human models.
  • Figure 3: The detailed feature interaction mechanism in CPO. The same network architecture is applied in both consecutive pose optimizer and multi-head solver(pose and shape) except the decoder. Here we take the consecutive pose optimizer as the reference.
  • Figure 4: The capture systems of FreeMotion. In (a), we use a dense-camera capture system with LiDARs for accurate pose and shape capture. In (b), we set LiDARs and cameras at three views to capture human motions in large-scale multi-person scenes.
  • Figure 5: Qualitative comparisons. The point cloud matches the result better, representing more accurate estimation for pose, shape, and translation. Point cloud is far from results of MOVIN.
  • Figure 6: Qualitative comparisons in cross-dataset evaluation. SemanticKITTI and HucenLife do not provide SMPL annotations.
  • ...and 10 more figures