STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a Non-Autoregressive Transformer for Robot Following Ahead
Mohammad Mahdavian, Payam Nikdel, Mahdi TaherAhmadi, Mo Chen
TL;DR
The paper tackles robot follow-ahead by predicting both future human 3D body pose and hip trajectory from observed motion. It introduces a non-autoregressive transformer with two parallel prediction heads for pose and trajectory, augmented by a Shared Attention module and an End Attention mechanism to strengthen cross-task learning and temporal modeling. Empirical results on Human3.6M show competitive pose accuracy and improved trajectory prediction, while achieving faster inference suitable for real-time robotics, demonstrated in real-world follow-ahead experiments with a ZED2 camera and a Turtlebot2. The work demonstrates that jointly modeling pose and trajectory improves performance and enables richer follow behaviors, with ablations validating the contribution of the shared and end-attention components.
Abstract
In this paper, we develop a neural network model to predict future human motion from an observed human motion history. We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time. The proposed architecture divides human motion prediction into two parts: 1) the human trajectory, which is the hip joint 3D position over time and 2) the human pose which is the all other joints 3D positions over time with respect to a fixed hip joint. We propose to make the two predictions simultaneously, as the shared representation can improve the model performance. Therefore, the model consists of two sets of encoders and decoders. First, a multi-head attention module applied to encoder outputs improves human trajectory. Second, another multi-head self-attention module applied to encoder outputs concatenated with decoder outputs facilitates learning of temporal dependencies. Our model is well-suited for robotic applications in terms of test accuracy and speed, and compares favorably with respect to state-of-the-art methods. We demonstrate the real-world applicability of our work via the Robot Follow-Ahead task, a challenging yet practical case study for our proposed model.
