Table of Contents
Fetching ...

Map-Aware Human Pose Prediction for Robot Follow-Ahead

Qingyuan Jiang, Burak Susam, Jun-Jee Chao, Volkan Isler

TL;DR

This work addresses the problem of forecasting the full 3D trajectory of a human in such environments by showing that one can first predict the 2D trajectory and then estimate the full 3D trajectory by conditioning the estimator on the predicted 2D trajectory.

Abstract

In the robot follow-ahead task, a mobile robot is tasked to maintain its relative position in front of a moving human actor while keeping the actor in sight. To accomplish this task, it is important that the robot understand the full 3D pose of the human (since the head orientation can be different than the torso) and predict future human poses so as to plan accordingly. This prediction task is especially tricky in a complex environment with junctions and multiple corridors. In this work, we address the problem of forecasting the full 3D trajectory of a human in such environments. Our main insight is to show that one can first predict the 2D trajectory and then estimate the full 3D trajectory by conditioning the estimator on the predicted 2D trajectory. With this approach, we achieve results comparable or better than the state-of-the-art methods three times faster. As part of our contribution, we present a new dataset where, in contrast to existing datasets, the human motion is in a much larger area than a single room. We also present a complete robot system that integrates our human pose forecasting network on the mobile robot to enable real-time robot follow-ahead and present results from real-world experiments in multiple buildings on campus. Our project page, including supplementary material and videos, can be found at: https://qingyuan-jiang.github.io/iros2024_poseForecasting/

Map-Aware Human Pose Prediction for Robot Follow-Ahead

TL;DR

This work addresses the problem of forecasting the full 3D trajectory of a human in such environments by showing that one can first predict the 2D trajectory and then estimate the full 3D trajectory by conditioning the estimator on the predicted 2D trajectory.

Abstract

In the robot follow-ahead task, a mobile robot is tasked to maintain its relative position in front of a moving human actor while keeping the actor in sight. To accomplish this task, it is important that the robot understand the full 3D pose of the human (since the head orientation can be different than the torso) and predict future human poses so as to plan accordingly. This prediction task is especially tricky in a complex environment with junctions and multiple corridors. In this work, we address the problem of forecasting the full 3D trajectory of a human in such environments. Our main insight is to show that one can first predict the 2D trajectory and then estimate the full 3D trajectory by conditioning the estimator on the predicted 2D trajectory. With this approach, we achieve results comparable or better than the state-of-the-art methods three times faster. As part of our contribution, we present a new dataset where, in contrast to existing datasets, the human motion is in a much larger area than a single room. We also present a complete robot system that integrates our human pose forecasting network on the mobile robot to enable real-time robot follow-ahead and present results from real-world experiments in multiple buildings on campus. Our project page, including supplementary material and videos, can be found at: https://qingyuan-jiang.github.io/iros2024_poseForecasting/
Paper Structure (19 sections, 6 equations, 5 figures, 3 tables)

This paper contains 19 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (a) The Robot Follow-Ahead Task: A mobile robot maintains the sight of a human actor while driving in front of them in an indoor environment (b) Map-aware human pose prediction. To achieve the follow-ahead task, given the pose histories (shown in green), we predict long-term human poses (shown in red with ground truth in blue) by incorporating the local map information and generating input for a predictive robot controller.
  • Figure 2: Robot. Our mobile robot is assembled with two Realsense RGB-D cameras based on a Rover robot. We use the front camera to build the map for localization while navigating. The rear camera detects and tracks the human actor for 3D skeleton poses. The robot coordinate frame is also shown.
  • Figure 3: Network. Our network has two parts. A PathNet to predict human trajectory, and a PoseNet to predict human future poses. The PathNet takes input from the occupancy map as well as the human trajectory and predicts the human future trajectory. The PoseNet uses the prediction results and local pose as input and predicts the future poses with a Gated Recurrent Unit (GRU) based network.
  • Figure 4: Qualitative results of the human pose forecasting. Each column corresponds to a different method. Human pose history, prediction results, and ground truth are shown in green, red, and blue. TR vaswani_attention_2017 and LT cao_long-term_2020 fail to predict the turn (red outline), whereas CA mao_contact-aware_2022 and ours are successful in correctly predicting it (black outline).
  • Figure 5: Robot Follow-ahead. We visualize the planned path based on the human pose predictions. The human trajectory history, trajectory prediction, and the ground truth are shown in green, red, and blue. We visualize the planned robot path in arrows for each time step. We visualize the map as the background, white for obstacles and gray for free space.