Table of Contents
Fetching ...

Social LSTM with Dynamic Occupancy Modeling for Realistic Pedestrian Trajectory Prediction

Ahmed Alia, Mohcine Chraibi, Armin Seyfried

TL;DR

The paper tackles realistic pedestrian trajectory prediction in crowded environments by addressing physical realism without compromising displacement accuracy. It introduces a density-aware Dynamic Occupied Space (DOS) loss that augments Social LSTM with a Collision Penalty (CP) and a dynamic radius $\bar{R}$ to adapt occupied space to scene density, yielding improved collision avoidance and displacement metrics. Evaluations on five Lyon Festival of Lights datasets show up to 31% reduction in Collision Rate and 5–6% improvements in ADE and FDE on average, with consistent superiority over ADE-Social LSTM, TTC-Social LSTM, and other baselines, especially in heterogeneous crowds. The approach enhances realism in crowd prediction and highlights the potential for extending the model to account for environmental constraints and human-environment interactions in future work.

Abstract

In dynamic and crowded environments, realistic pedestrian trajectory prediction remains a challenging task due to the complex nature of human motion and the mutual influences among individuals. Deep learning models have recently achieved promising results by implicitly learning such patterns from 2D trajectory data. However, most approaches treat pedestrians as point entities, ignoring the physical space that each person occupies. To address these limitations, this paper proposes a novel deep learning model that enhances the Social LSTM with a new Dynamic Occupied Space loss function. This loss function guides Social LSTM in learning to avoid realistic collisions without increasing displacement error across different crowd densities, ranging from low to high, in both homogeneous and heterogeneous density settings. Such a function achieves this by combining the average displacement error with a new collision penalty that is sensitive to scene density and individual spatial occupancy. For efficient training and evaluation, five datasets were generated from real pedestrian trajectories recorded during the Festival of Lights in Lyon 2022. Four datasets represent homogeneous crowd conditions -- low, medium, high, and very high density -- while the fifth corresponds to a heterogeneous density distribution. The experimental findings indicate that the proposed model not only lowers collision rates but also enhances displacement prediction accuracy in each dataset. Specifically, the model achieves up to a 31% reduction in the collision rate and reduces the average displacement error and the final displacement error by 5% and 6%, respectively, on average across all datasets compared to the baseline. Moreover, the proposed model consistently outperforms several state-of-the-art deep learning models across most test sets.

Social LSTM with Dynamic Occupancy Modeling for Realistic Pedestrian Trajectory Prediction

TL;DR

The paper tackles realistic pedestrian trajectory prediction in crowded environments by addressing physical realism without compromising displacement accuracy. It introduces a density-aware Dynamic Occupied Space (DOS) loss that augments Social LSTM with a Collision Penalty (CP) and a dynamic radius to adapt occupied space to scene density, yielding improved collision avoidance and displacement metrics. Evaluations on five Lyon Festival of Lights datasets show up to 31% reduction in Collision Rate and 5–6% improvements in ADE and FDE on average, with consistent superiority over ADE-Social LSTM, TTC-Social LSTM, and other baselines, especially in heterogeneous crowds. The approach enhances realism in crowd prediction and highlights the potential for extending the model to account for environmental constraints and human-environment interactions in future work.

Abstract

In dynamic and crowded environments, realistic pedestrian trajectory prediction remains a challenging task due to the complex nature of human motion and the mutual influences among individuals. Deep learning models have recently achieved promising results by implicitly learning such patterns from 2D trajectory data. However, most approaches treat pedestrians as point entities, ignoring the physical space that each person occupies. To address these limitations, this paper proposes a novel deep learning model that enhances the Social LSTM with a new Dynamic Occupied Space loss function. This loss function guides Social LSTM in learning to avoid realistic collisions without increasing displacement error across different crowd densities, ranging from low to high, in both homogeneous and heterogeneous density settings. Such a function achieves this by combining the average displacement error with a new collision penalty that is sensitive to scene density and individual spatial occupancy. For efficient training and evaluation, five datasets were generated from real pedestrian trajectories recorded during the Festival of Lights in Lyon 2022. Four datasets represent homogeneous crowd conditions -- low, medium, high, and very high density -- while the fifth corresponds to a heterogeneous density distribution. The experimental findings indicate that the proposed model not only lowers collision rates but also enhances displacement prediction accuracy in each dataset. Specifically, the model achieves up to a 31% reduction in the collision rate and reduces the average displacement error and the final displacement error by 5% and 6%, respectively, on average across all datasets compared to the baseline. Moreover, the proposed model consistently outperforms several state-of-the-art deep learning models across most test sets.

Paper Structure

This paper contains 23 sections, 20 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Illustration of the pedestrian trajectory prediction problem, where the goal is to minimize displacement error and avoid collisions between the predicted future paths. The circles with a radius of 0.2m represent the body of a person and its space requirement.
  • Figure 2: Overview of our model, including its training and evaluation process. In Social LSTM, S-Pooling refers to social pooling layers that share hidden states ($h$) among nearby LSTMs, while circles represent the occupied space of individuals.
  • Figure 3: Visualized examples of pedestrians within a scene at a specific frame, illustrating varying overlap ratios. Each circle with radius $R$ represents the physical space occupied by an individual. Red circles indicate pedestrians involved in collisions, with red lines showing the distance between the centers of the colliding individuals. Blue circles represent pedestrians without any overlaps.
  • Figure 5: Top view of Lyon's Festival of Lights 2022, with the trajectory tracking region highlighted in red. Figure from Dufour2025.
  • Figure 6: Dataset preparation flowchart. In the trajectory segmentation step, $a_{184}(161)$ denotes the trajectory segment of person ID 184 starting from frame 161. Here, $i$ refers to the person ID, $s$ to the initial frame (start time step), and $e$ to the final time step. Time steps are represented by frame orders. In the density classification step, $D$ denotes the density.
  • ...and 3 more figures