Table of Contents
Fetching ...

Long-Term Human Motion Prediction Using Spatio-Temporal Maps of Dynamics

Yufei Zhu, Andrey Rudenko, Tomasz P. Kucner, Achim J. Lilienthal, Martin Magnusson

TL;DR

The paper tackles LHMP by leveraging Maps of Dynamics (MoDs) to encode environment-driven motion patterns and support horizons up to $60$ seconds. It generalizes CLiFF-LHMP into the MoD-LHMP framework, enabling substitutions of various MoDs (e.g., CLiFF-map, Time-Conditioned CLiFF-map, STeF-map) and introduces a ranking mechanism to output the most likely trajectory, along with a Time-Conditioned MoD to handle diurnal changes. Across two real indoor datasets, ATC and Edinburgh, MoD-LHMP outperforms state-of-the-art deep learning baselines, with up to a 50% reduction in ADE at long horizons, and the time-conditioned variant achieving the strongest overall accuracy. The approach runs in real time on CPU and yields feasible trajectories that align with environmental topology, highlighting the practical impact for robotics in human-shared environments.

Abstract

Long-term human motion prediction (LHMP) is important for the safe and efficient operation of autonomous robots and vehicles in environments shared with humans. Accurate predictions are important for applications including motion planning, tracking, human-robot interaction, and safety monitoring. In this paper, we exploit Maps of Dynamics (MoDs), which encode spatial or spatio-temporal motion patterns as environment features, to achieve LHMP for horizons of up to 60 seconds. We propose an MoD-informed LHMP framework that supports various types of MoDs and includes a ranking method to output the most likely predicted trajectory, improving practical utility in robotics. Further, a time-conditioned MoD is introduced to capture motion patterns that vary across different times of day. We evaluate MoD-LHMP instantiated with three types of MoDs. Experiments on two real-world datasets show that MoD-informed method outperforms learning-based ones, with up to 50\% improvement in average displacement error, and the time-conditioned variant achieves the highest accuracy overall. Project code is available at https://github.com/test-bai-cpu/LHMP-with-MoDs.git

Long-Term Human Motion Prediction Using Spatio-Temporal Maps of Dynamics

TL;DR

The paper tackles LHMP by leveraging Maps of Dynamics (MoDs) to encode environment-driven motion patterns and support horizons up to seconds. It generalizes CLiFF-LHMP into the MoD-LHMP framework, enabling substitutions of various MoDs (e.g., CLiFF-map, Time-Conditioned CLiFF-map, STeF-map) and introduces a ranking mechanism to output the most likely trajectory, along with a Time-Conditioned MoD to handle diurnal changes. Across two real indoor datasets, ATC and Edinburgh, MoD-LHMP outperforms state-of-the-art deep learning baselines, with up to a 50% reduction in ADE at long horizons, and the time-conditioned variant achieving the strongest overall accuracy. The approach runs in real time on CPU and yields feasible trajectories that align with environmental topology, highlighting the practical impact for robotics in human-shared environments.

Abstract

Long-term human motion prediction (LHMP) is important for the safe and efficient operation of autonomous robots and vehicles in environments shared with humans. Accurate predictions are important for applications including motion planning, tracking, human-robot interaction, and safety monitoring. In this paper, we exploit Maps of Dynamics (MoDs), which encode spatial or spatio-temporal motion patterns as environment features, to achieve LHMP for horizons of up to 60 seconds. We propose an MoD-informed LHMP framework that supports various types of MoDs and includes a ranking method to output the most likely predicted trajectory, improving practical utility in robotics. Further, a time-conditioned MoD is introduced to capture motion patterns that vary across different times of day. We evaluate MoD-LHMP instantiated with three types of MoDs. Experiments on two real-world datasets show that MoD-informed method outperforms learning-based ones, with up to 50\% improvement in average displacement error, and the time-conditioned variant achieves the highest accuracy overall. Project code is available at https://github.com/test-bai-cpu/LHMP-with-MoDs.git

Paper Structure

This paper contains 22 sections, 2 equations, 10 figures, 2 tables, 3 algorithms.

Figures (10)

  • Figure 1: Time-conditioned CLiFF-map in the ATC dataset, for 10:00 (left), 14:00 (middle) and 18:00 (right), showing changes of motion patterns throughout the day represented by CLiFF-map. At each location, the colored arrow shows the mean of the Gaussian component with maximum weight, where the arrow color encodes orientation and the arrow length encodes speed.
  • Figure 2: STeF-map in the ATC dataset, for 10:00 (left), 14:00 (middle) and 18:00 (right), showing changes of motion patterns throughout the day represented by STeF-map. At each location, the colored arrow shows the dominant orientation for each cell in STeF-map, where the arrow color encodes orientation.
  • Figure 3: A focused view of CLiFF-maps at one location in the east corridor of the ATC dataset. For each hour between 9:00 to 21:00, Time-Conditioned CLiFF-maps of the example location are shown, together with the general CLiFF-map of the whole day at the same location. Arrows show the mean of each component in SWGMM, jointly representing speed and orientation. Arrow length encodes speed, while arrow transparency reflects the component weight (lighter arrows correspond to smaller weights).
  • Figure 4: Edinburgh dataset MoDs, showing motion pattern changes from 09:00 (first row) to 14:00 (second row). In Time-Conditioned CLiFF-map, colored arrows show the mean of the Gaussian component with transparency indicating the component weights. In STeF-map, colored arrows show each discretized orientation, with transparency reflecting their corresponding probabilities. In both maps, arrow color encodes orientation.
  • Figure 5: Examples of predicted trajectory rankings using TC-CLiFF-LHMP in the ATC (left) and Edinburgh (right) datasets. The red line represents the ground truth trajectory, the green line represents the observed trajectory and blue lines the predicted trajectories, with darker shades of blue indicating higher-ranked predictions. Predictions in darker blue shows higher accuracy, showcasing the effectiveness of the ranking mechanism. Right figure also provides a view from the camera in the Edinburgh dataset. The green area shows the marginal region where trajectories start and end. The red area shows the region where people are waiting for a lift.
  • ...and 5 more figures