Table of Contents
Fetching ...

Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention

Wenyi Xiong, Jian Chen

TL;DR

This work tackles map-free trajectory prediction in complex driving scenarios by filtering redundant information across time, space, and frequency domains. It introduces a frequency-enhanced temporal model using Mixture of Experts, a dual selective attention mechanism to prune irrelevant interactions, and a multimodal decoder supervised with patch-level losses. The method achieves competitive results on NuScenes and Argoverse, often surpassing other map-free models and approaching map-based baselines. The patch-wise supervision further stabilizes predictions and improves trajectory consistency in multi-agent environments.

Abstract

Trajectory prediction is crucial for the reliability and safety of autonomous driving systems, yet it remains a challenging task in complex interactive scenarios. Existing methods often struggle to efficiently extract valuable scene information from redundant data, thereby reducing computational efficiency and prediction accuracy, especially when dealing with intricate agent interactions. To address these challenges, we propose a novel map-free trajectory prediction algorithm that achieves trajectory prediction across the temporal, spatial, and frequency domains. Specifically, in temporal information processing, We utilize a Mixture of Experts (MoE) mechanism to adaptively select critical frequency components. Concurrently, we extract these components and integrate multi-scale temporal features. Subsequently, a selective attention module is proposed to filter out redundant information in both temporal sequences and spatial interactions. Finally, we design a multimodal decoder. Under the supervision of patch-level and point-level losses, we obtain reasonable trajectory results. Experiments on Nuscences datasets demonstrate the superiority of our algorithm, validating its effectiveness in handling complex interactive scenarios.

Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention

TL;DR

This work tackles map-free trajectory prediction in complex driving scenarios by filtering redundant information across time, space, and frequency domains. It introduces a frequency-enhanced temporal model using Mixture of Experts, a dual selective attention mechanism to prune irrelevant interactions, and a multimodal decoder supervised with patch-level losses. The method achieves competitive results on NuScenes and Argoverse, often surpassing other map-free models and approaching map-based baselines. The patch-wise supervision further stabilizes predictions and improves trajectory consistency in multi-agent environments.

Abstract

Trajectory prediction is crucial for the reliability and safety of autonomous driving systems, yet it remains a challenging task in complex interactive scenarios. Existing methods often struggle to efficiently extract valuable scene information from redundant data, thereby reducing computational efficiency and prediction accuracy, especially when dealing with intricate agent interactions. To address these challenges, we propose a novel map-free trajectory prediction algorithm that achieves trajectory prediction across the temporal, spatial, and frequency domains. Specifically, in temporal information processing, We utilize a Mixture of Experts (MoE) mechanism to adaptively select critical frequency components. Concurrently, we extract these components and integrate multi-scale temporal features. Subsequently, a selective attention module is proposed to filter out redundant information in both temporal sequences and spatial interactions. Finally, we design a multimodal decoder. Under the supervision of patch-level and point-level losses, we obtain reasonable trajectory results. Experiments on Nuscences datasets demonstrate the superiority of our algorithm, validating its effectiveness in handling complex interactive scenarios.

Paper Structure

This paper contains 20 sections, 20 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Comparison of workflows in multiple domains. In the temporal domain, our approach filters out redundant time nodes compared to previous methods. In the spatial domain, we decrease the consideration of redundant interactions. Additionally, we adjust the frequency distribution to suppress some high - frequency noises.
  • Figure 2: The map-less network architecture comprises three core components: Frequency-Temporary Selective Attention Module (FTSAM), Spatial Selective Attention Module (SSAM), and Multimodal Decoder. For historical trajectories, MoE-based frequency-domain filtering and multi-temporal-granularity modeling are adopted. TSAM and SSAM then reweight to eliminate redundant temporal features and interaction nodes, respectively. The decoder generates trajectories supervised by both point-level and patch-level losses.