Table of Contents
Fetching ...

Towards Predicting Any Human Trajectory In Context

Ryo Fujii, Hideo Saito, Ryo Hachiuma

TL;DR

TrajICL tackles the challenge of adapting pedestrian trajectory predictors to diverse real-world environments without on-device fine-tuning. It introduces STES to select spatio-temporally similar in-context demonstrations and PG-ES to refine selection using predicted futures, all trained on a large synthetic MOTSynth dataset to boost generalization. The framework is implemented on a Transformer-based predictor with RCPE and SRPE, and uses a two-stage training scheme (VTP and in-context training) with a min-over-K loss. Empirical results show strong in-domain and cross-domain performance, often surpassing fine-tuned baselines, while maintaining suitability for edge devices; however, inference cost and pool quality remain areas for future improvement.

Abstract

Predicting accurate future trajectories of pedestrians is essential for autonomous systems but remains a challenging task due to the need for adaptability in different environments and domains. A common approach involves collecting scenario-specific data and performing fine-tuning via backpropagation. However, the need to fine-tune for each new scenario is often impractical for deployment on edge devices. To address this challenge, we introduce TrajICL, an In-Context Learning (ICL) framework for pedestrian trajectory prediction that enables adaptation without fine-tuning on the scenario-specific data at inference time without requiring weight updates. We propose a spatio-temporal similarity-based example selection (STES) method that selects relevant examples from previously observed trajectories within the same scene by identifying similar motion patterns at corresponding locations. To further refine this selection, we introduce prediction-guided example selection (PG-ES), which selects examples based on both the past trajectory and the predicted future trajectory, rather than relying solely on the past trajectory. This approach allows the model to account for long-term dynamics when selecting examples. Finally, instead of relying on small real-world datasets with limited scenario diversity, we train our model on a large-scale synthetic dataset to enhance its prediction ability by leveraging in-context examples. Extensive experiments demonstrate that TrajICL achieves remarkable adaptation across both in-domain and cross-domain scenarios, outperforming even fine-tuned approaches across multiple public benchmarks. Project Page: https://fujiry0.github.io/TrajICL-project-page/.

Towards Predicting Any Human Trajectory In Context

TL;DR

TrajICL tackles the challenge of adapting pedestrian trajectory predictors to diverse real-world environments without on-device fine-tuning. It introduces STES to select spatio-temporally similar in-context demonstrations and PG-ES to refine selection using predicted futures, all trained on a large synthetic MOTSynth dataset to boost generalization. The framework is implemented on a Transformer-based predictor with RCPE and SRPE, and uses a two-stage training scheme (VTP and in-context training) with a min-over-K loss. Empirical results show strong in-domain and cross-domain performance, often surpassing fine-tuned baselines, while maintaining suitability for edge devices; however, inference cost and pool quality remain areas for future improvement.

Abstract

Predicting accurate future trajectories of pedestrians is essential for autonomous systems but remains a challenging task due to the need for adaptability in different environments and domains. A common approach involves collecting scenario-specific data and performing fine-tuning via backpropagation. However, the need to fine-tune for each new scenario is often impractical for deployment on edge devices. To address this challenge, we introduce TrajICL, an In-Context Learning (ICL) framework for pedestrian trajectory prediction that enables adaptation without fine-tuning on the scenario-specific data at inference time without requiring weight updates. We propose a spatio-temporal similarity-based example selection (STES) method that selects relevant examples from previously observed trajectories within the same scene by identifying similar motion patterns at corresponding locations. To further refine this selection, we introduce prediction-guided example selection (PG-ES), which selects examples based on both the past trajectory and the predicted future trajectory, rather than relying solely on the past trajectory. This approach allows the model to account for long-term dynamics when selecting examples. Finally, instead of relying on small real-world datasets with limited scenario diversity, we train our model on a large-scale synthetic dataset to enhance its prediction ability by leveraging in-context examples. Extensive experiments demonstrate that TrajICL achieves remarkable adaptation across both in-domain and cross-domain scenarios, outperforming even fine-tuned approaches across multiple public benchmarks. Project Page: https://fujiry0.github.io/TrajICL-project-page/.

Paper Structure

This paper contains 23 sections, 5 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Illustration of real-world trajectory prediction scenarios and the adaptation pipeline. (a) The adaptation pipeline of traditional methods, where models are trained on scenario-specific data. (b) The adaptation pipeline of our proposed TrajICL, which automatically selects examples and adapts to novel scenarios by leveraging the scenario-specific examples without requiring training on scenario-specific data.
  • Figure 2: An illustration of our TrajICL framework. (a) The overall architecture includes an embedding layer, a trajectory encoder, an in-context-aware trajectory predictor, and a multi-modal decoder. (b) Rather than relying solely on past trajectories for the example selection, we introduce prediction-guided example selection, which leverages both past and predicted future trajectories to identify more relevant examples.
  • Figure 3: Performance of random example selection and the proposed STES at varying numbers of in-context examples.
  • Figure 4: Qualitative comparison between random example selection and our proposed PG-STES.
  • Figure 5: Qualitative results on MotSynth, JRDB, WildTrack, and SDD. These examples demonstrate scenarios where our TrajICL outperforms the Social-Transmotion baseline. TrajICL effectively learns the plausible motion patterns from examples.
  • ...and 6 more figures