Table of Contents
Fetching ...

Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs

Nicolas Gorlo, Lukas Schmid, Luca Carlone

TL;DR

A novel approach for long-term human trajectory prediction in indoor human-centric environments, which is essential for long-horizon robot planning in these environments, using Large Language Models to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene.

Abstract

We present a novel approach for long-term human trajectory prediction in indoor human-centric environments, which is essential for long-horizon robot planning in these environments. State-of-the-art human trajectory prediction methods are limited by their focus on collision avoidance and short-term planning, and their inability to model complex interactions of humans with the environment. In contrast, our approach overcomes these limitations by predicting sequences of human interactions with the environment and using this information to guide trajectory predictions over a horizon of up to 60s. We leverage Large Language Models (LLMs) to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene. This information is given as a 3D Dynamic Scene Graph that encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. We then ground these interaction sequences into multi-modal spatio-temporal distributions over human positions using a probabilistic approach based on continuous-time Markov Chains. To evaluate our approach, we introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments, which also includes annotations of human-object interactions. We show in thorough experimental evaluations that our approach achieves a 54% lower average negative log-likelihood and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged (i.e., evaluated in a zero-shot fashion on the dataset) baselines for a time horizon of 60s.

Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs

TL;DR

A novel approach for long-term human trajectory prediction in indoor human-centric environments, which is essential for long-horizon robot planning in these environments, using Large Language Models to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene.

Abstract

We present a novel approach for long-term human trajectory prediction in indoor human-centric environments, which is essential for long-horizon robot planning in these environments. State-of-the-art human trajectory prediction methods are limited by their focus on collision avoidance and short-term planning, and their inability to model complex interactions of humans with the environment. In contrast, our approach overcomes these limitations by predicting sequences of human interactions with the environment and using this information to guide trajectory predictions over a horizon of up to 60s. We leverage Large Language Models (LLMs) to predict interactions with the environment by conditioning the LLM prediction on rich contextual information about the scene. This information is given as a 3D Dynamic Scene Graph that encodes the geometry, semantics, and traversability of the environment into a hierarchical representation. We then ground these interaction sequences into multi-modal spatio-temporal distributions over human positions using a probabilistic approach based on continuous-time Markov Chains. To evaluate our approach, we introduce a new semi-synthetic dataset of long-term human trajectories in complex indoor environments, which also includes annotations of human-object interactions. We show in thorough experimental evaluations that our approach achieves a 54% lower average negative log-likelihood and a 26.5% lower Best-of-20 displacement error compared to the best non-privileged (i.e., evaluated in a zero-shot fashion on the dataset) baselines for a time horizon of 60s.
Paper Structure (8 sections, 4 equations, 6 figures, 5 tables)

This paper contains 8 sections, 4 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Our method, LP$^2$, predicts a spatio-temporal distribution over long-term (up to $60s$) human trajectories in complex environments by reasoning about their interactions with the scene, represented as a 3D Dynamic Scene Graph.
  • Figure 2: Approach overview. The interaction sequence prediction (ISP) module estimates sequences of future interactions with the environment using the rich semantic information of the scene graph $S$. The probabilistic trajectory prediction (PTP) module connects these sequences into cohesive trajectories and predicts a continuous spatio-temporal probability distribution over the future human position in the environment.
  • Figure 3: Interaction sequence tree $T_I$. The LLM is auto-regressively prompted with the scene $S$ and the previous interactions $A_p$ to predict the next interactions (green nodes). Shortest paths spatially connect interactions (gray nodes). Progression through the tree over time is modeled as a CTMC with probability-based transition rates (blue). As time progresses, the probability mass moves from left to right.
  • Figure 4: Spatio-temporal distribution over the human's predicted position (orange) progressing through time. True trajectory colored from light at $t=0s$ to dark at $t=60s$ green.
  • Figure 5: Example trajectories in the office (left) and home (right). Trajectory is shown in green (past) and red (future).
  • ...and 1 more figures