Fine-Grained Behavior and Lane Constraints Guided Trajectory Prediction Method
Wenyi Xiong, Jian Chen, Ziheng Qi
TL;DR
This work tackles the challenge of predicting fine-grained, continuous future vehicle behavior under lane constraints by introducing BLNet, a dual-stream attention framework that learns behavior state queries and lane queries from HD-map and historical data. A two-stage decoder first generates multimodal trajectory proposals and then refines them using future-motion features and lane continuity, with auxiliary losses $\mathcal{L}_{behav}$ and $\mathcal{L}_{lane}$ guiding the queries. Extensive experiments on nuScenes and Argoverse show state-of-the-art performance across multiple multimodal metrics (e.g., $\text{minADE}$ and $\text{minFDE}$) and comprehensive ablations verify each component’s contribution. The approach delivers more realistic, intent-aware trajectory predictions with practical compute, enabling safer and more reliable planning in autonomous driving, with future work focusing on adaptive feature fusion between environment and agent-driven constraints.
Abstract
Trajectory prediction, as a critical component of autonomous driving systems, has attracted the attention of many researchers. Existing prediction algorithms focus on extracting more detailed scene features or selecting more reasonable trajectory destinations. However, in the face of dynamic and evolving future movements of the target vehicle, these algorithms cannot provide a fine-grained and continuous description of future behaviors and lane constraints, which degrades the prediction accuracy. To address this challenge, we present BLNet, a novel dualstream architecture that synergistically integrates behavioral intention recognition and lane constraint modeling through parallel attention mechanisms. The framework generates fine-grained behavior state queries (capturing spatial-temporal movement patterns) and lane queries (encoding lane topology constraints), supervised by two auxiliary losses, respectively. Subsequently, a two-stage decoder first produces trajectory proposals, then performs point-level refinement by jointly incorporating both the continuity of passed lanes and future motion features. Extensive experiments on two large datasets, nuScenes and Argoverse, show that our network exhibits significant performance gains over existing direct regression and goal-based algorithms.
