Table of Contents
Fetching ...

Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset

Andrea Avogaro, Andrea Toaiari, Federico Cunico, Xiangmin Xu, Haralambos Dafas, Alessandro Vinciarelli, Emma Li, Marco Cristani

TL;DR

HARPER is introduced, a novel dataset for 3D body pose estimation and forecasting in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics, which includes reproducible benchmarks for 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction.

Abstract

We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to rigorously compare their results with those we provide in this work.

Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset

TL;DR

HARPER is introduced, a novel dataset for 3D body pose estimation and forecasting in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics, which includes reproducible benchmarks for 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction.

Abstract

We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to rigorously compare their results with those we provide in this work.
Paper Structure (11 sections, 7 figures, 4 tables)

This paper contains 11 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: HARPER Showcase. (Top-left) We exploit the Spot onboard equipment to let the robot perceive people. (Top-right) thanks to a 6-camera OptiTrack setup we provide 3D human poses represented with 21-joints with $0.035$ mm of error, used as reference. (Second row) an additional external RGB camera shows the actions performed. (Third row) The Gripper cam RGB Point of View: the yellow dots are the joints back-projected in the image plane. (Fourth row). The Gripper cam Depth Point of View, with the ground truth joints. Zoom the figure for a better view of the joints.
  • Figure 2: A 6-camera OptiTrack system covers a $6\times6$ squared meters area where users and Spot can freely move. The external RGB camera's field of view covers the setting. The 5 Spot on-body greyscale + depth cameras and the RGB-D frontal camera (gripper) cover the environment surrounding the robot.
  • Figure 3: Joints visibility from the robot's perspective. The left chart shows how many frames contain exactly $n$ joints for $n=1,\ldots,21$. The right plot shows the percentage of frames in which the different parts of the skeleton are visible.
  • Figure 4: Distribution of distances between Spot and users (the distance considers the two closest joints of human and robot). Red columns correspond to distances lower than $10$ cm, considered as cases of physical contact.
  • Figure 5: 3D human pose estimation from the robot results. (a) On the left, the predicted 2D joints (in blue) by HRNet sun2019deep and the corresponding ground truth joints (in red). On the right, the depth image with the same 2D detections overimposed. The depth will serve to do the lifting. (b) The lifted 3D poses alongside the complete OptiTrack skeletons. (c) MPJPE (in mm) for every visible joint (inside the depth FOV) on the test set. The size of the blobs is proportional to the errors, while colors are related to the number of times a joint is visible from the robot's perspective.
  • ...and 2 more figures