Table of Contents
Fetching ...

DyTTP: Trajectory Prediction with Normalization-Free Transformers

JianLin Zhu, HongKuo Niu

TL;DR

This work tackles efficient, robust trajectory forecasting for autonomous driving by removing normalization overhead in transformer backbones and leveraging lightweight ensembling. It introduces DyTTP, which replaces Layer Normalization in a HiVT-based backbone with DynamicTanh (DyT) and augment it with a snapshot ensemble using cosine-annealing learning rate cycles to capture diverse model hypotheses. The training objective combines regression and multi-modal classification losses, enabling accurate multi-step trajectory predictions. On the Argoverse dataset, the method achieves improved inference speed and robustness with competitive predictive accuracy, demonstrating the value of normalization-free transformers plus snapshot ensembling for real-time autonomous-vehicle forecasting.

Abstract

Accurate trajectory prediction is a cornerstone for the safe operation of autonomous driving systems, where understanding the dynamic behavior of surrounding agents is crucial. Transformer-based architectures have demonstrated significant promise in capturing complex spatio-temporality dependencies. However, their reliance on normalization layers can lead to computation overhead and training instabilities. In this work, we present a two-fold approach to address these challenges. First, we integrate DynamicTanh (DyT), which is the latest method to promote transformers, into the backbone, replacing traditional layer normalization. This modification simplifies the network architecture and improves the stability of the inference. We are the first work to deploy the DyT to the trajectory prediction task. Complementing this, we employ a snapshot ensemble strategy to further boost trajectory prediction performance. Using cyclical learning rate scheduling, multiple model snapshots are captured during a single training run. These snapshots are then aggregated via simple averaging at inference time, allowing the model to benefit from diverse hypotheses without incurring substantial additional computational cost. Extensive experiments on Argoverse datasets demonstrate that our combined approach significantly improves prediction accuracy, inference speed and robustness in diverse driving scenarios. This work underscores the potential of normalization-free transformer designs augmented with lightweight ensemble techniques in advancing trajectory forecasting for autonomous vehicles.

DyTTP: Trajectory Prediction with Normalization-Free Transformers

TL;DR

This work tackles efficient, robust trajectory forecasting for autonomous driving by removing normalization overhead in transformer backbones and leveraging lightweight ensembling. It introduces DyTTP, which replaces Layer Normalization in a HiVT-based backbone with DynamicTanh (DyT) and augment it with a snapshot ensemble using cosine-annealing learning rate cycles to capture diverse model hypotheses. The training objective combines regression and multi-modal classification losses, enabling accurate multi-step trajectory predictions. On the Argoverse dataset, the method achieves improved inference speed and robustness with competitive predictive accuracy, demonstrating the value of normalization-free transformers plus snapshot ensembling for real-time autonomous-vehicle forecasting.

Abstract

Accurate trajectory prediction is a cornerstone for the safe operation of autonomous driving systems, where understanding the dynamic behavior of surrounding agents is crucial. Transformer-based architectures have demonstrated significant promise in capturing complex spatio-temporality dependencies. However, their reliance on normalization layers can lead to computation overhead and training instabilities. In this work, we present a two-fold approach to address these challenges. First, we integrate DynamicTanh (DyT), which is the latest method to promote transformers, into the backbone, replacing traditional layer normalization. This modification simplifies the network architecture and improves the stability of the inference. We are the first work to deploy the DyT to the trajectory prediction task. Complementing this, we employ a snapshot ensemble strategy to further boost trajectory prediction performance. Using cyclical learning rate scheduling, multiple model snapshots are captured during a single training run. These snapshots are then aggregated via simple averaging at inference time, allowing the model to benefit from diverse hypotheses without incurring substantial additional computational cost. Extensive experiments on Argoverse datasets demonstrate that our combined approach significantly improves prediction accuracy, inference speed and robustness in diverse driving scenarios. This work underscores the potential of normalization-free transformer designs augmented with lightweight ensemble techniques in advancing trajectory forecasting for autonomous vehicles.

Paper Structure

This paper contains 18 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of our DyTTP. AA DyT, Temp DyT, AL DyT and Global DyT denote agent-agnet, temporal, agent-lane and global interaction with normalization-free transformers, respectively.
  • Figure 2: Original transformer block(left) and DynamicTanh layer(right), which is a straighforward repalcement for traditional Layer Normalization.