Table of Contents
Fetching ...

UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction

Nisarga Nilavadi, Andrey Rudenko, Timm Linder

TL;DR

UPTor tackles the problem of jointly predicting 3D human pose dynamics and global trajectories from a short pose sequence to support human-aware robot navigation. It introduces a motion transformation that places sequences in a global, orientation-aligned frame, a Graph Attention Network to encode skeletal structure, and a non-autoregressive Transformer to fuse spatial and temporal dynamics into unified predictions, all trained end-to-end. Evaluations on Human3.6M, CMU-Mocap, and the newly released DARKO dataset demonstrate competitive pose accuracy and improved trajectory prediction with a smaller, faster model suitable for real-time robotic use. The DARKO dataset and accompanying code aim to advance research in human-robot interaction and navigation in realistic, egocentric perception settings, with potential impact on intralogistics and service robots.

Abstract

We introduce a unified approach to forecast the dynamics of human keypoints along with the motion trajectory based on a short sequence of input poses. While many studies address either full-body pose prediction or motion trajectory prediction, only a few attempt to merge them. We propose a motion transformation technique to simultaneously predict full-body pose and trajectory key-points in a global coordinate frame. We utilize an off-the-shelf 3D human pose estimation module, a graph attention network to encode the skeleton structure, and a compact, non-autoregressive transformer suitable for real-time motion prediction for human-robot interaction and human-aware navigation. We introduce a human navigation dataset ``DARKO'' with specific focus on navigational activities that are relevant for human-aware mobile robot navigation. We perform extensive evaluation on Human3.6M, CMU-Mocap, and our DARKO dataset. In comparison to prior work, we show that our approach is compact, real-time, and accurate in predicting human navigation motion across all datasets. Result animations, our dataset, and code will be available at https://nisarganc.github.io/UPTor-page/

UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction

TL;DR

UPTor tackles the problem of jointly predicting 3D human pose dynamics and global trajectories from a short pose sequence to support human-aware robot navigation. It introduces a motion transformation that places sequences in a global, orientation-aligned frame, a Graph Attention Network to encode skeletal structure, and a non-autoregressive Transformer to fuse spatial and temporal dynamics into unified predictions, all trained end-to-end. Evaluations on Human3.6M, CMU-Mocap, and the newly released DARKO dataset demonstrate competitive pose accuracy and improved trajectory prediction with a smaller, faster model suitable for real-time robotic use. The DARKO dataset and accompanying code aim to advance research in human-robot interaction and navigation in realistic, egocentric perception settings, with potential impact on intralogistics and service robots.

Abstract

We introduce a unified approach to forecast the dynamics of human keypoints along with the motion trajectory based on a short sequence of input poses. While many studies address either full-body pose prediction or motion trajectory prediction, only a few attempt to merge them. We propose a motion transformation technique to simultaneously predict full-body pose and trajectory key-points in a global coordinate frame. We utilize an off-the-shelf 3D human pose estimation module, a graph attention network to encode the skeleton structure, and a compact, non-autoregressive transformer suitable for real-time motion prediction for human-robot interaction and human-aware navigation. We introduce a human navigation dataset ``DARKO'' with specific focus on navigational activities that are relevant for human-aware mobile robot navigation. We perform extensive evaluation on Human3.6M, CMU-Mocap, and our DARKO dataset. In comparison to prior work, we show that our approach is compact, real-time, and accurate in predicting human navigation motion across all datasets. Result animations, our dataset, and code will be available at https://nisarganc.github.io/UPTor-page/

Paper Structure

This paper contains 13 sections, 4 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Human key points estimated from the robot’s perception stack and human motion prediction in global 3D coordinates from our UPTor model.
  • Figure 2: UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction Transformer
  • Figure 3: Left: A top-down view of four color-coded motion sequences, with their corresponding transformed sequence encircled. Faded colors mark the motion start. Right: Transformation of a single sequence. Original motion is represented by a green trajectory and skeleton. Transformation parameters include the angle $\theta$ between the motion direction and the positive x-axis, and the translation vector $v$ at $T_1$.
  • Figure 4: Distribution of locomotion velocities and participants' heights in the DARKO dataset.
  • Figure 5: Human 3.6M dataset predictions across 2-second horizon at 10 Hz
  • ...and 2 more figures