Table of Contents
Fetching ...

Map-Free Trajectory Prediction with Map Distillation and Hierarchical Encoding

Xiaodong Liu, Yucheng Xing, Xin Wang

TL;DR

MFTP is introduced, a Map-Free Trajectory Prediction method that eliminates the need for HD maps during inference while still benefiting from map priors during training via knowledge distillation and introduces an iterative decoder that sequentially decodes trajectory queries to generate the final predictions.

Abstract

Reliable motion forecasting of surrounding agents is essential for ensuring the safe operation of autonomous vehicles. Many existing trajectory prediction methods rely heavily on high-definition (HD) maps as strong driving priors. However, the availability and accuracy of these priors are not guaranteed due to substantial costs to build, localization errors of vehicles, or ongoing road constructions. In this paper, we introduce MFTP, a Map-Free Trajectory Prediction method that offers several advantages. First, it eliminates the need for HD maps during inference while still benefiting from map priors during training via knowledge distillation. Second, we present a novel hierarchical encoder that effectively extracts spatial-temporal agent features and aggregates them into multiple trajectory queries. Additionally, we introduce an iterative decoder that sequentially decodes trajectory queries to generate the final predictions. Extensive experiments show that our approach achieves state-of-the-art performance on the Argoverse dataset under the map-free setting.

Map-Free Trajectory Prediction with Map Distillation and Hierarchical Encoding

TL;DR

MFTP is introduced, a Map-Free Trajectory Prediction method that eliminates the need for HD maps during inference while still benefiting from map priors during training via knowledge distillation and introduces an iterative decoder that sequentially decodes trajectory queries to generate the final predictions.

Abstract

Reliable motion forecasting of surrounding agents is essential for ensuring the safe operation of autonomous vehicles. Many existing trajectory prediction methods rely heavily on high-definition (HD) maps as strong driving priors. However, the availability and accuracy of these priors are not guaranteed due to substantial costs to build, localization errors of vehicles, or ongoing road constructions. In this paper, we introduce MFTP, a Map-Free Trajectory Prediction method that offers several advantages. First, it eliminates the need for HD maps during inference while still benefiting from map priors during training via knowledge distillation. Second, we present a novel hierarchical encoder that effectively extracts spatial-temporal agent features and aggregates them into multiple trajectory queries. Additionally, we introduce an iterative decoder that sequentially decodes trajectory queries to generate the final predictions. Extensive experiments show that our approach achieves state-of-the-art performance on the Argoverse dataset under the map-free setting.

Paper Structure

This paper contains 27 sections, 10 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The differences between existing methods with ours. Existing map-based methods utilize map information during both training and inference, whereas map-free methods do not. In contrast, our method employs a pre-trained map-based teacher network to distill map priors into a map-free student network.
  • Figure 2: Overall framework of MFTP. MFTP has a pre-trained map-based teacher model and a map-free student model. The student has the same architecture as the teacher except for the map-related modules. The hierarchical agent features are progressively extracted after agent-agent temporal and spatial attention through the Feature Aggregation (FA) module in the encoder, and these features are then fused to form $K$ trajectory queries, corresponding to $K$ multimodal future trajectories. In the teacher network, the agents learn map priors through the agent-map attention module in the encoder stage, and query-map attention module during the decoder stage. Through knowledge distillation of intermediate features, we squeeze map priors into the map-free student network.
  • Figure 3: Illustration of hierarchical feature aggregation and fusion. When provided with multiple features of a single agent, our approach employs multiple queries to extract different levels of features progressively. The first query aggregates all agent features (dashed line) while the second only gathers features for every 2 time intervals (solid line). Subsequently, these features are fused into a single trajectory query, encompassing the hierarchical spatial-temporal features of the agent.
  • Figure 4: Qualitative results on Argoverse validation set. (a) illustrates the performance of our map-free model on intersection scenarios with various driving behaviors (e.g., go-straight, left-turn, big left-turn and right-turn from left to right) without leveraging map priors. (b) demonstrates that, with the help of knowledge distillation (KD), our map-free model can predict future trajectories more closely aligned with the ground truth. (a) and (b) share the same figure legend. Best viewed in color and zoomed in.