Table of Contents
Fetching ...

Adapting to Length Shift: FlexiLength Network for Trajectory Prediction

Yi Xu, Yun Fu

TL;DR

The paper addresses Observation Length Shift in Transformer-based trajectory prediction by introducing the FlexiLength Network (FLN), a general framework that trains on multiple observation lengths and delivers temporal invariant representations. FLN comprises FlexiLength Calibration (FLC), which uses shared encoders and temporal distillation to align predictions across short, medium, and long inputs, and FlexiLength Adaptation (FLA), which employs independent positional encoding and specialized layer normalization to reduce length-induced discrepancies. Empirical results on ETH/UCY, nuScenes, and Argoverse 1 show FLN consistently improves prediction accuracy over Isolated Training and adapts to unseen lengths with a one-time training process, applicable to Transformer-based models such as AgentFormer and HiVT. The work also includes ablations and normalization-shift analyses to justify design choices and demonstrates practical implications for robust trajectory forecasting in real-world, length-variable scenarios.

Abstract

Trajectory prediction plays an important role in various applications, including autonomous driving, robotics, and scene understanding. Existing approaches mainly focus on developing compact neural networks to increase prediction precision on public datasets, typically employing a standardized input duration. However, a notable issue arises when these models are evaluated with varying observation lengths, leading to a significant performance drop, a phenomenon we term the Observation Length Shift. To address this issue, we introduce a general and effective framework, the FlexiLength Network (FLN), to enhance the robustness of existing trajectory prediction techniques against varying observation periods. Specifically, FLN integrates trajectory data with diverse observation lengths, incorporates FlexiLength Calibration (FLC) to acquire temporal invariant representations, and employs FlexiLength Adaptation (FLA) to further refine these representations for more accurate future trajectory predictions. Comprehensive experiments on multiple datasets, ie, ETH/UCY, nuScenes, and Argoverse 1, demonstrate the effectiveness and flexibility of our proposed FLN framework.

Adapting to Length Shift: FlexiLength Network for Trajectory Prediction

TL;DR

The paper addresses Observation Length Shift in Transformer-based trajectory prediction by introducing the FlexiLength Network (FLN), a general framework that trains on multiple observation lengths and delivers temporal invariant representations. FLN comprises FlexiLength Calibration (FLC), which uses shared encoders and temporal distillation to align predictions across short, medium, and long inputs, and FlexiLength Adaptation (FLA), which employs independent positional encoding and specialized layer normalization to reduce length-induced discrepancies. Empirical results on ETH/UCY, nuScenes, and Argoverse 1 show FLN consistently improves prediction accuracy over Isolated Training and adapts to unseen lengths with a one-time training process, applicable to Transformer-based models such as AgentFormer and HiVT. The work also includes ablations and normalization-shift analyses to justify design choices and demonstrates practical implications for robust trajectory forecasting in real-world, length-variable scenarios.

Abstract

Trajectory prediction plays an important role in various applications, including autonomous driving, robotics, and scene understanding. Existing approaches mainly focus on developing compact neural networks to increase prediction precision on public datasets, typically employing a standardized input duration. However, a notable issue arises when these models are evaluated with varying observation lengths, leading to a significant performance drop, a phenomenon we term the Observation Length Shift. To address this issue, we introduce a general and effective framework, the FlexiLength Network (FLN), to enhance the robustness of existing trajectory prediction techniques against varying observation periods. Specifically, FLN integrates trajectory data with diverse observation lengths, incorporates FlexiLength Calibration (FLC) to acquire temporal invariant representations, and employs FlexiLength Adaptation (FLA) to further refine these representations for more accurate future trajectory predictions. Comprehensive experiments on multiple datasets, ie, ETH/UCY, nuScenes, and Argoverse 1, demonstrate the effectiveness and flexibility of our proposed FLN framework.
Paper Structure (18 sections, 8 equations, 10 figures, 7 tables)

This paper contains 18 sections, 8 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: The Observation Length Shift phenomenon is a common issue in the trajectory prediction task. The AgentFormer model is trained with a standard observation length of 8 timesteps and tested at varying lengths to compare with Isolated Training (IT). For each dataset, the bar groups from left to right correspond to observation lengths of 2, 4, and 6 timesteps, respectively.
  • Figure 2: ADE$_{5}$ and FDE$_{5}$ results for the AgentFormer model, which is trained on the nuScenes dataset using a standard observation length of 4 timesteps, and tested at shorter observation lengths of 2 and 3 timesteps. These results are compared to those obtained through Isolated Training (IT).
  • Figure 3: Layer Normalization statistics in two different layers of the Transformer encoder within the AgentFormer model, isolatedly trained on the Eth dataset at observation lengths of 2, 6, and 8 timesteps.
  • Figure 4: Illustration of our FlexiLength Network (FLN). The map encoding branch is omitted for simplicity. During training, with inputs of varying observation lengths $H^{S}$, $H^{M}$, and $H^{L}$, we utilize FlexiLength Calibration (FLC) to acquire temporal invariant representations. Furthermore, FlexiLength Adaptation (FLA) is employed to align these invariant representations with different sub-networks, thereby augmenting the model capabilities. During inference, the sub-network with the closest match in observation length is activated.
  • Figure 5: Performance on five ETH/UCY datasets using the AgentFormer model, measured with ADE$_{20}$. These results are compared with those of the baseline model and Isolated Training (IT), showcasing notable improvements achieved by our FLN.
  • ...and 5 more figures