Table of Contents
Fetching ...

Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment

Hiromu Taketsugu, Takeru Oba, Takahiro Maeda, Shohei Nobuhara, Norimichi Ukita

TL;DR

The paper tackles the problem of physically implausible predictions in Human Trajectory Prediction (HTP) by introducing Locomotion Embodiment, which integrates a physics-grounded locomotion generator with a differentiable plausibility surrogate, LocoVal. Training uses the EmLoco loss to supervise multi-head, stochastic HTP predictions with pose-trajectory consistency, while a LocoVal filter enables at-inference pruning of implausible trajectories. The approach leverages a two-stage training pipeline: first learning a differentiable surrogate from a physics-based simulator, then training the HTP network with pose cues and the EmLoco objective across multiple datasets (JTA, JRDB, ETH/UCY), showing state-of-the-art improvements in ADE/FDE and plausibility metrics. The results demonstrate practical gains for safety-critical applications and highlight the potential for plug-and-play plausibility filtering on existing HTP models, although pose accuracy and simulator fidelity remain important future considerations.

Abstract

Humans can predict future human trajectories even from momentary observations by using human pose-related cues. However, previous Human Trajectory Prediction (HTP) methods leverage the pose cues implicitly, resulting in implausible predictions. To address this, we propose Locomotion Embodiment, a framework that explicitly evaluates the physical plausibility of the predicted trajectory by locomotion generation under the laws of physics. While the plausibility of locomotion is learned with an indifferentiable physics simulator, it is replaced by our differentiable Locomotion Value function to train an HTP network in a data-driven manner. In particular, our proposed Embodied Locomotion loss is beneficial for efficiently training a stochastic HTP network using multiple heads. Furthermore, the Locomotion Value filter is proposed to filter out implausible trajectories at inference. Experiments demonstrate that our method enhances even the state-of-the-art HTP methods across diverse datasets and problem settings. Our code is available at: https://github.com/ImIntheMiddle/EmLoco.

Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment

TL;DR

The paper tackles the problem of physically implausible predictions in Human Trajectory Prediction (HTP) by introducing Locomotion Embodiment, which integrates a physics-grounded locomotion generator with a differentiable plausibility surrogate, LocoVal. Training uses the EmLoco loss to supervise multi-head, stochastic HTP predictions with pose-trajectory consistency, while a LocoVal filter enables at-inference pruning of implausible trajectories. The approach leverages a two-stage training pipeline: first learning a differentiable surrogate from a physics-based simulator, then training the HTP network with pose cues and the EmLoco objective across multiple datasets (JTA, JRDB, ETH/UCY), showing state-of-the-art improvements in ADE/FDE and plausibility metrics. The results demonstrate practical gains for safety-critical applications and highlight the potential for plug-and-play plausibility filtering on existing HTP models, although pose accuracy and simulator fidelity remain important future considerations.

Abstract

Humans can predict future human trajectories even from momentary observations by using human pose-related cues. However, previous Human Trajectory Prediction (HTP) methods leverage the pose cues implicitly, resulting in implausible predictions. To address this, we propose Locomotion Embodiment, a framework that explicitly evaluates the physical plausibility of the predicted trajectory by locomotion generation under the laws of physics. While the plausibility of locomotion is learned with an indifferentiable physics simulator, it is replaced by our differentiable Locomotion Value function to train an HTP network in a data-driven manner. In particular, our proposed Embodied Locomotion loss is beneficial for efficiently training a stochastic HTP network using multiple heads. Furthermore, the Locomotion Value filter is proposed to filter out implausible trajectories at inference. Experiments demonstrate that our method enhances even the state-of-the-art HTP methods across diverse datasets and problem settings. Our code is available at: https://github.com/ImIntheMiddle/EmLoco.

Paper Structure

This paper contains 60 sections, 5 equations, 19 figures, 16 tables.

Figures (19)

  • Figure 1: Overview of our method. Unlike existing methods, which often predict physically implausible trajectories, our framework uses locomotion generation in a physics simulator to incorporate the laws of physics to HTP by training the plausibility score as the consistency between the observed 3D pose and future possible trajectories, which are indicated by $\bm{j}_{0}$ and $\bm{\tau}_{f}^{1,2,3}$, respectively, in the figure. Additionally, at inference, our method can evaluate predicted trajectories, $\hat{\bm{\tau}}_{f}^{1,2,3}$, to filter out implausible ones.
  • Figure 2: The overview of the proposed framework. Ⓣ and Ⓕ mean the weights are trained and fixed, respectively.
  • Figure 3: The overview of training our LocoVal function, $\mathcal{V}$.
  • Figure 5: Comparison of the stochastic HTP between Social-Trans socialtransmotion and ours with $5$ heads on the JTA dataset JTA. Human poses are shown at a doubled scale for visualization purposes.
  • Figure 6: Comparison of the prediction results between the baseline Social-Trans socialtransmotion and our method with momentary observations on the JTA dataset JTA. The observed pose are displayed at a doubled scale for visualization purposes.
  • ...and 14 more figures