Table of Contents
Fetching ...

TrajPRed: Trajectory Prediction with Region-based Relation Learning

Chen Zhou, Ghassan AlRegib, Armin Parchami, Kunjan Singh

TL;DR

TrajPRed tackles safe trajectory forecasting in traffic scenes by jointly modeling social interactions and stochastic goals. It introduces region-based relation learning that encodes local joint agent dynamics as trajectory density maps and uses a convolutional autoencoder to extract region grids, coupled with a CVAE for multi-goal estimation to capture diverse future intents. The framework fuses region-based relations with stochastic goal conditioning to predict future trajectories $\hat{Y}$ from history $X$, achieving improved robustness and showing state-of-the-art performance on the Stanford Drone Dataset and strong gains on ETH-UCY. These methods advance safe planning for mixed autonomous systems by producing diverse, realistic predictions and demonstrating robustness to perturbations in agent states.

Abstract

Forecasting human trajectories in traffic scenes is critical for safety within mixed or fully autonomous systems. Human future trajectories are driven by two major stimuli, social interactions, and stochastic goals. Thus, reliable forecasting needs to capture these two stimuli. Edge-based relation modeling represents social interactions using pairwise correlations from precise individual states. Nevertheless, edge-based relations can be vulnerable under perturbations. To alleviate these issues, we propose a region-based relation learning paradigm that models social interactions via region-wise dynamics of joint states, i.e., the changes in the density of crowds. In particular, region-wise agent joint information is encoded within convolutional feature grids. Social relations are modeled by relating the temporal changes of local joint information from a global perspective. We show that region-based relations are less susceptible to perturbations. In order to account for the stochastic individual goals, we exploit a conditional variational autoencoder to realize multi-goal estimation and diverse future prediction. Specifically, we perform variational inference via the latent distribution, which is conditioned on the correlation between input states and associated target goals. Sampling from the latent distribution enables the framework to reliably capture the stochastic behavior in test data. We integrate multi-goal estimation and region-based relation learning to model the two stimuli, social interactions, and stochastic goals, in a prediction framework. We evaluate our framework on the ETH-UCY dataset and Stanford Drone Dataset (SDD). We show that the diverse prediction better fits the ground truth when incorporating the relation module. Our framework outperforms the state-of-the-art models on SDD by $27.61\%$/$18.20\%$ of ADE/FDE metrics.

TrajPRed: Trajectory Prediction with Region-based Relation Learning

TL;DR

TrajPRed tackles safe trajectory forecasting in traffic scenes by jointly modeling social interactions and stochastic goals. It introduces region-based relation learning that encodes local joint agent dynamics as trajectory density maps and uses a convolutional autoencoder to extract region grids, coupled with a CVAE for multi-goal estimation to capture diverse future intents. The framework fuses region-based relations with stochastic goal conditioning to predict future trajectories from history , achieving improved robustness and showing state-of-the-art performance on the Stanford Drone Dataset and strong gains on ETH-UCY. These methods advance safe planning for mixed autonomous systems by producing diverse, realistic predictions and demonstrating robustness to perturbations in agent states.

Abstract

Forecasting human trajectories in traffic scenes is critical for safety within mixed or fully autonomous systems. Human future trajectories are driven by two major stimuli, social interactions, and stochastic goals. Thus, reliable forecasting needs to capture these two stimuli. Edge-based relation modeling represents social interactions using pairwise correlations from precise individual states. Nevertheless, edge-based relations can be vulnerable under perturbations. To alleviate these issues, we propose a region-based relation learning paradigm that models social interactions via region-wise dynamics of joint states, i.e., the changes in the density of crowds. In particular, region-wise agent joint information is encoded within convolutional feature grids. Social relations are modeled by relating the temporal changes of local joint information from a global perspective. We show that region-based relations are less susceptible to perturbations. In order to account for the stochastic individual goals, we exploit a conditional variational autoencoder to realize multi-goal estimation and diverse future prediction. Specifically, we perform variational inference via the latent distribution, which is conditioned on the correlation between input states and associated target goals. Sampling from the latent distribution enables the framework to reliably capture the stochastic behavior in test data. We integrate multi-goal estimation and region-based relation learning to model the two stimuli, social interactions, and stochastic goals, in a prediction framework. We evaluate our framework on the ETH-UCY dataset and Stanford Drone Dataset (SDD). We show that the diverse prediction better fits the ground truth when incorporating the relation module. Our framework outperforms the state-of-the-art models on SDD by / of ADE/FDE metrics.
Paper Structure (16 sections, 1 equation, 6 figures, 5 tables)

This paper contains 16 sections, 1 equation, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Example scenarios in the ETH-UCY dataset that reflect two major stimuli, social interactions and stochastic goals, in human trajectory prediction. (a) shows an example of interactions in group avoiding. Two groups of people walking toward each other avoid others along their respective future trajectories. (b) shows an example of multiple stochastic future destinations (goals). A pedestrian approaching the curbside might make a turn or follow the original direction. Plausible goals are represented by the star symbols. Solid lines represent historical trajectories while dashed lines represent future trajectories.
  • Figure 2: Two relation modeling paradigms. In an edge-based setting, the relations of individuals are modeled separately with pairwise correlations based on accurate states. In a region-based setting, relations are modeled with joint representations within each region instead of accurate individual states.
  • Figure 3: The overall diagram of the proposed trajectory prediction framework. The region-based joint relations are learned via the Relation Module using the trajectory maps $M$. The History Encoder and the Future Encoder encode the individual observed and the future trajectory patterns, $X$, $Y$, respectively. The Multi-goal Estimation generates multiple end positions conditioned on the encoded features from the history and future encoders. The Future Decoder predicts the future trajectories $\hat{Y}$ using the combination of joint relation features, individual history trajectory features, and estimated end positions. The orange dashed arrows, and the black arrows stand for the data flow in training only, and both training and inference, respectively.
  • Figure 4: The overall two-stage workflow of the proposed regional relation module. In the first stage, region-wise agent joint information is encoded within the grids of the latent feature maps $f_s$ via auto-encoding. In the second stage, the intra-regional interaction features $h_{st}$ are extracted via Temporal Encoding using the latent feature grids $f_s^m,...,f_s^n$ obtained by the pre-trained autoencoder. The inter-regional relation representations $R_{st}$ are learned via relating the agent-specific $h_{st}$ evolving across regions in the Regional Relation Extraction.
  • Figure 5: The workflow of stochastic goal estimation. The orange dashed arrows, the blue arrows, and the black arrows stand for the data flow in training only, inference only, and both training and inference, respectively.
  • ...and 1 more figures