Table of Contents
Fetching ...

Fine-Grained Behavior and Lane Constraints Guided Trajectory Prediction Method

Wenyi Xiong, Jian Chen, Ziheng Qi

TL;DR

This work tackles the challenge of predicting fine-grained, continuous future vehicle behavior under lane constraints by introducing BLNet, a dual-stream attention framework that learns behavior state queries and lane queries from HD-map and historical data. A two-stage decoder first generates multimodal trajectory proposals and then refines them using future-motion features and lane continuity, with auxiliary losses $\mathcal{L}_{behav}$ and $\mathcal{L}_{lane}$ guiding the queries. Extensive experiments on nuScenes and Argoverse show state-of-the-art performance across multiple multimodal metrics (e.g., $\text{minADE}$ and $\text{minFDE}$) and comprehensive ablations verify each component’s contribution. The approach delivers more realistic, intent-aware trajectory predictions with practical compute, enabling safer and more reliable planning in autonomous driving, with future work focusing on adaptive feature fusion between environment and agent-driven constraints.

Abstract

Trajectory prediction, as a critical component of autonomous driving systems, has attracted the attention of many researchers. Existing prediction algorithms focus on extracting more detailed scene features or selecting more reasonable trajectory destinations. However, in the face of dynamic and evolving future movements of the target vehicle, these algorithms cannot provide a fine-grained and continuous description of future behaviors and lane constraints, which degrades the prediction accuracy. To address this challenge, we present BLNet, a novel dualstream architecture that synergistically integrates behavioral intention recognition and lane constraint modeling through parallel attention mechanisms. The framework generates fine-grained behavior state queries (capturing spatial-temporal movement patterns) and lane queries (encoding lane topology constraints), supervised by two auxiliary losses, respectively. Subsequently, a two-stage decoder first produces trajectory proposals, then performs point-level refinement by jointly incorporating both the continuity of passed lanes and future motion features. Extensive experiments on two large datasets, nuScenes and Argoverse, show that our network exhibits significant performance gains over existing direct regression and goal-based algorithms.

Fine-Grained Behavior and Lane Constraints Guided Trajectory Prediction Method

TL;DR

This work tackles the challenge of predicting fine-grained, continuous future vehicle behavior under lane constraints by introducing BLNet, a dual-stream attention framework that learns behavior state queries and lane queries from HD-map and historical data. A two-stage decoder first generates multimodal trajectory proposals and then refines them using future-motion features and lane continuity, with auxiliary losses and guiding the queries. Extensive experiments on nuScenes and Argoverse show state-of-the-art performance across multiple multimodal metrics (e.g., and ) and comprehensive ablations verify each component’s contribution. The approach delivers more realistic, intent-aware trajectory predictions with practical compute, enabling safer and more reliable planning in autonomous driving, with future work focusing on adaptive feature fusion between environment and agent-driven constraints.

Abstract

Trajectory prediction, as a critical component of autonomous driving systems, has attracted the attention of many researchers. Existing prediction algorithms focus on extracting more detailed scene features or selecting more reasonable trajectory destinations. However, in the face of dynamic and evolving future movements of the target vehicle, these algorithms cannot provide a fine-grained and continuous description of future behaviors and lane constraints, which degrades the prediction accuracy. To address this challenge, we present BLNet, a novel dualstream architecture that synergistically integrates behavioral intention recognition and lane constraint modeling through parallel attention mechanisms. The framework generates fine-grained behavior state queries (capturing spatial-temporal movement patterns) and lane queries (encoding lane topology constraints), supervised by two auxiliary losses, respectively. Subsequently, a two-stage decoder first produces trajectory proposals, then performs point-level refinement by jointly incorporating both the continuity of passed lanes and future motion features. Extensive experiments on two large datasets, nuScenes and Argoverse, show that our network exhibits significant performance gains over existing direct regression and goal-based algorithms.

Paper Structure

This paper contains 29 sections, 26 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of different prediction methods. (a) Direct regression. Unreasonable trajectories may be generated due to the lack of a spatial prior. (b) Goal-based method. There can be different paths to the same goal.
  • Figure 2: An overview of the proposed algorithm framework. Unlike existing one token one trajectory (or one goal one trajectory) algorithms, our algorithm utilizes fine-grained behavior queries and lane queries to guide the predictions. Finally, the continuity of the lane and future motion features are aggregated to refine the trajectories at the point-level.
  • Figure 3: The pipeline of our proposed algorithm. After vectorization, agent and map information are fed into the encoder and processed using RNN and transformer to get its corresponding high dimensional feature information. Subsequently, we designed two attention branches (lane attention branch and behavior state attention branch) to obtain temporary lane constraints and behavior queries. Finally, a two-stage decoder is designed to predict the target trajectories and refine them at the point-level using the continuity of the lane and future behavior.
  • Figure 4: Scene information vectorization. After vectorization, lane segments and trajectories are represented by a vectors.
  • Figure 5: (a) The two-stage decoder. We first use a simple GRU decoder to predict the target proposals. Subsequently, we use the lane continuity and future behavior to refine the proposals. (b) Illustration of distance-based lane segments selection.
  • ...and 1 more figures