Table of Contents
Fetching ...

Recurrent Aligned Network for Generalized Pedestrian Trajectory Prediction

Yonghao Dong, Le Wang, Sanping Zhou, Gang Hua, Changyin Sun

TL;DR

This paper tackles generalized pedestrian trajectory prediction by removing the need for target-domain data during training. It introduces the Recurrent Aligned Network ($RAN$), which combines a pre-aligned representation with a recurrent alignment module to minimize domain gaps at both time-state and time-sequence levels while incorporating social interactions. The model achieves state-of-the-art cross-domain generalization on ETH-UCY, SDD, and NBA, with notable improvements in ADE/FDE and evidence of stronger performance in domains with larger shifts, such as NBA. The approach offers practical value for autonomous driving and surveillance by enabling robust trajectory prediction across unseen environments. The work also presents ablations confirming the importance of both alignment components and discusses future work on reducing the requirement for multiple source domains.

Abstract

Pedestrian trajectory prediction is a crucial component in computer vision and robotics, but remains challenging due to the domain shift problem. Previous studies have tried to tackle this problem by leveraging a portion of the trajectory data from the target domain to adapt the model. However, such domain adaptation methods are impractical in real-world scenarios, as it is infeasible to collect trajectory data from all potential target domains. In this paper, we study a task named generalized pedestrian trajectory prediction, with the aim of generalizing the model to unseen domains without accessing their trajectories. To tackle this task, we introduce a Recurrent Aligned Network~(RAN) to minimize the domain gap through domain alignment. Specifically, we devise a recurrent alignment module to effectively align the trajectory feature spaces at both time-state and time-sequence levels by the recurrent alignment strategy.Furthermore, we introduce a pre-aligned representation module to combine social interactions with the recurrent alignment strategy, which aims to consider social interactions during the alignment process instead of just target trajectories. We extensively evaluate our method and compare it with state-of-the-art methods on three widely used benchmarks. The experimental results demonstrate the superior generalization capability of our method. Our work not only fills the gap in the generalization setting for practical pedestrian trajectory prediction but also sets strong baselines in this field.

Recurrent Aligned Network for Generalized Pedestrian Trajectory Prediction

TL;DR

This paper tackles generalized pedestrian trajectory prediction by removing the need for target-domain data during training. It introduces the Recurrent Aligned Network (), which combines a pre-aligned representation with a recurrent alignment module to minimize domain gaps at both time-state and time-sequence levels while incorporating social interactions. The model achieves state-of-the-art cross-domain generalization on ETH-UCY, SDD, and NBA, with notable improvements in ADE/FDE and evidence of stronger performance in domains with larger shifts, such as NBA. The approach offers practical value for autonomous driving and surveillance by enabling robust trajectory prediction across unseen environments. The work also presents ablations confirming the importance of both alignment components and discusses future work on reducing the requirement for multiple source domains.

Abstract

Pedestrian trajectory prediction is a crucial component in computer vision and robotics, but remains challenging due to the domain shift problem. Previous studies have tried to tackle this problem by leveraging a portion of the trajectory data from the target domain to adapt the model. However, such domain adaptation methods are impractical in real-world scenarios, as it is infeasible to collect trajectory data from all potential target domains. In this paper, we study a task named generalized pedestrian trajectory prediction, with the aim of generalizing the model to unseen domains without accessing their trajectories. To tackle this task, we introduce a Recurrent Aligned Network~(RAN) to minimize the domain gap through domain alignment. Specifically, we devise a recurrent alignment module to effectively align the trajectory feature spaces at both time-state and time-sequence levels by the recurrent alignment strategy.Furthermore, we introduce a pre-aligned representation module to combine social interactions with the recurrent alignment strategy, which aims to consider social interactions during the alignment process instead of just target trajectories. We extensively evaluate our method and compare it with state-of-the-art methods on three widely used benchmarks. The experimental results demonstrate the superior generalization capability of our method. Our work not only fills the gap in the generalization setting for practical pedestrian trajectory prediction but also sets strong baselines in this field.
Paper Structure (19 sections, 11 equations, 7 figures, 9 tables, 1 algorithm)

This paper contains 19 sections, 11 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) The graph shows performance decrease for SocialVAE xu2022socialvae, which demonstrates current work can't address domain shift challenge well for pedestrian trajectory prediction. 'A to B' below X-axis denotes that the model is trained in source dataset 'A' and tested in target dataset 'B'. (b) Definition of the generalized pedestrian trajectory prediction. In this generalization setting, the model is trained in multiple source domain datasets and tested in the target domain dataset.
  • Figure 2: The framework of the training phase of RAN. RAN need two source domains in the training process. Given trajectories of two different domains, we first model the pre-aligned representation of trajectories by the stepwise attention layers. Then we align the two domains' representation at both the time-state and time-sequence levels using the recurrent alignment module. Finally, we decode the well-aligned features (generalized trajectory features) into trajectory predictions. Note that the RNN layers and the stepwise attention layers share weights, respectively.
  • Figure 3: The inference phase of the proposed RAN framework. The input data are from target domain scenarios, and there is no requirement for data from at least two domains during the inference phase.
  • Figure 4: Ablation study of different alignment approaches on ETH-UCY dataset. The model is trained on SDD and NBA. The lower the better.
  • Figure 5: Ablation study of using different loss coefficients on ETH-UCY dataset. The model is trained on SDD and NBA. X-axis is $\lambda_1 / \lambda_2$. Y-axis is ADE or FDE. The lower the better.
  • ...and 2 more figures