Table of Contents
Fetching ...

Towards Practical Human Motion Prediction with LiDAR Point Clouds

Xiao Han, Yiming Ren, Yichen Yao, Yujing Sun, Yuexin Ma

TL;DR

The first single-LiDAR-based 3D human motion prediction approach, which receives the raw LiDAR point cloud as input and forecasts future 3D human poses directly is proposed, which adaptively maps the observed motion manifold to future poses and effectively models the spatial-temporal correlations of human motions for further refinement of prediction results.

Abstract

Human motion prediction is crucial for human-centric multimedia understanding and interacting. Current methods typically rely on ground truth human poses as observed input, which is not practical for real-world scenarios where only raw visual sensor data is available. To implement these methods in practice, a pre-phrase of pose estimation is essential. However, such two-stage approaches often lead to performance degradation due to the accumulation of errors. Moreover, reducing raw visual data to sparse keypoint representations significantly diminishes the density of information, resulting in the loss of fine-grained features. In this paper, we propose \textit{LiDAR-HMP}, the first single-LiDAR-based 3D human motion prediction approach, which receives the raw LiDAR point cloud as input and forecasts future 3D human poses directly. Building upon our novel structure-aware body feature descriptor, LiDAR-HMP adaptively maps the observed motion manifold to future poses and effectively models the spatial-temporal correlations of human motions for further refinement of prediction results. Extensive experiments show that our method achieves state-of-the-art performance on two public benchmarks and demonstrates remarkable robustness and efficacy in real-world deployments.

Towards Practical Human Motion Prediction with LiDAR Point Clouds

TL;DR

The first single-LiDAR-based 3D human motion prediction approach, which receives the raw LiDAR point cloud as input and forecasts future 3D human poses directly is proposed, which adaptively maps the observed motion manifold to future poses and effectively models the spatial-temporal correlations of human motions for further refinement of prediction results.

Abstract

Human motion prediction is crucial for human-centric multimedia understanding and interacting. Current methods typically rely on ground truth human poses as observed input, which is not practical for real-world scenarios where only raw visual sensor data is available. To implement these methods in practice, a pre-phrase of pose estimation is essential. However, such two-stage approaches often lead to performance degradation due to the accumulation of errors. Moreover, reducing raw visual data to sparse keypoint representations significantly diminishes the density of information, resulting in the loss of fine-grained features. In this paper, we propose \textit{LiDAR-HMP}, the first single-LiDAR-based 3D human motion prediction approach, which receives the raw LiDAR point cloud as input and forecasts future 3D human poses directly. Building upon our novel structure-aware body feature descriptor, LiDAR-HMP adaptively maps the observed motion manifold to future poses and effectively models the spatial-temporal correlations of human motions for further refinement of prediction results. Extensive experiments show that our method achieves state-of-the-art performance on two public benchmarks and demonstrates remarkable robustness and efficacy in real-world deployments.
Paper Structure (34 sections, 12 equations, 7 figures, 9 tables)

This paper contains 34 sections, 12 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: The pipeline of our LiDAR-HMP. First, we obtain the structure-aware body feature descriptor from the observed LiDAR point cloud frames. Then, we adaptively predict the human motion with learnable queries for initial predictions and explicitly model the spatial-temporal correlations among them to refine the predicted motions. Finally, we decode the joint-wise results and point-wise results for auxiliary supervision.
  • Figure 2: Qualitative comparisons of long-term predictions on LIPD and LiDARHuman26M dataset. "GT" denotes the future ground truth skeletons. The green arrow denotes the motion trace.
  • Figure 3: Evaluation for the generalization capability on various distances on LIPD.
  • Figure 4: Visualisation of the results in the occlusion case demonstrates the robustness of our LiDAR-based human motion prediction approach even in scenarios with occlusions. "GT" denotes the future ground truth skeletons.
  • Figure 5: Visualisation of the results in the noise case demonstrates the robustness of our LiDAR-based human motion prediction approach even in noisy environments. "GT" indicates the future ground truth skeletons.
  • ...and 2 more figures