Table of Contents
Fetching ...

Utilizing Navigation Paths to Generate Target Points for Enhanced End-to-End Autonomous Driving Planning

Yuanhua Shen, Jun Li

TL;DR

NTT tackles the lack of explicit driving intent in end-to-end autonomous driving by leveraging a navigation path to constrain planning. It first generates a target point $p_t$ from the navigation path and then completes the trajectory $\,\hat{T} \in \mathbb{R}^{k\times 2}$ using scene context and learned representations, ensuring alignment with navigation and safety in changing environments. Through a two-stage training regime and attention-based fusion with scene tokens, NTT achieves state-of-the-art planning performance on nuScenes, reducing planning displacement error and collision rate while ablations confirm the critical role of the navigation-guided target generation. The work demonstrates that integrating navigation information in end-to-end planning can yield clearer driving intent and safer, more reliable trajectories for practical autonomous driving systems.

Abstract

In recent years, end-to-end autonomous driving frameworks have been shown to not only enhance perception performance but also improve planning capabilities. However, most previous end-to-end autonomous driving frameworks have focused primarily on enhancing environmental perception while neglecting the learning of autonomous vehicle driving intent, which refers to the vehicle's intended direction of travel. In planning, the autonomous vehicle's direction is clear and well-defined, yet this crucial aspect has often been overlooked. This paper introduces NTT (Navigation to Target for Trajectory planning), a method within an end-to-end framework for autonomous driving. NTT generates the planned trajectory in two steps. First, it generates the future target point for the autonomous vehicle on the basis of the navigation path. Then, it produces the complete planned trajectory on the basis of this target point. On the one hand, generating the target point for the autonomous vehicle from the navigation path enables the vehicle to learn a clear driving intent. On the other hand, generating the trajectory on the basis of the target point allows for a flexible planned trajectory that can adapt to complex environmental changes, thereby enhancing the safety of the planning process. Our method achieved excellent planning performance on the widely used nuScenes dataset and its effectiveness was validated through ablation experiments.

Utilizing Navigation Paths to Generate Target Points for Enhanced End-to-End Autonomous Driving Planning

TL;DR

NTT tackles the lack of explicit driving intent in end-to-end autonomous driving by leveraging a navigation path to constrain planning. It first generates a target point from the navigation path and then completes the trajectory using scene context and learned representations, ensuring alignment with navigation and safety in changing environments. Through a two-stage training regime and attention-based fusion with scene tokens, NTT achieves state-of-the-art planning performance on nuScenes, reducing planning displacement error and collision rate while ablations confirm the critical role of the navigation-guided target generation. The work demonstrates that integrating navigation information in end-to-end planning can yield clearer driving intent and safer, more reliable trajectories for practical autonomous driving systems.

Abstract

In recent years, end-to-end autonomous driving frameworks have been shown to not only enhance perception performance but also improve planning capabilities. However, most previous end-to-end autonomous driving frameworks have focused primarily on enhancing environmental perception while neglecting the learning of autonomous vehicle driving intent, which refers to the vehicle's intended direction of travel. In planning, the autonomous vehicle's direction is clear and well-defined, yet this crucial aspect has often been overlooked. This paper introduces NTT (Navigation to Target for Trajectory planning), a method within an end-to-end framework for autonomous driving. NTT generates the planned trajectory in two steps. First, it generates the future target point for the autonomous vehicle on the basis of the navigation path. Then, it produces the complete planned trajectory on the basis of this target point. On the one hand, generating the target point for the autonomous vehicle from the navigation path enables the vehicle to learn a clear driving intent. On the other hand, generating the trajectory on the basis of the target point allows for a flexible planned trajectory that can adapt to complex environmental changes, thereby enhancing the safety of the planning process. Our method achieved excellent planning performance on the widely used nuScenes dataset and its effectiveness was validated through ablation experiments.
Paper Structure (16 sections, 14 equations, 4 figures, 3 tables)

This paper contains 16 sections, 14 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Previous end-to-end planning approaches often relied on low-quality navigation information or treated it merely as a prediction task (left), leading to high uncertainty in the driving intent of planned trajectories. We advocate leveraging the navigation path to constrain planning (right), which results in more accurate driving intent for the ego vehicle.
  • Figure 2: The NTT framework. Given surround-view images as inputs, an image feature network and a BEV encoder project the image features into the BEV feature space. The scene encoder learns scene tokens (including both agent and map tokens) from this space, which can be decoded into agent and map representations using respective heads. In the planning module, the planned trajectory is generated in two steps. First, the target generation module takes the navigation path, map information (decoded via the map head), ego pose, and scene tokens as inputs to produce the target point. The target point is then integrated with the scene tokens to generate the final planned trajectory.
  • Figure 3: Description of the target generation module. In the navigation-aware target encoder, we use the navigation path as prior information and the scene information as context to learn the probability distribution of dense target candidates. The candidate point with the highest probability value is selected as the final target point.
  • Figure 4: Visual comparison of NTT and VAD Jiang_Chen_Xu_Liao_Chen_Zhou_Zhang_Liu_Huang_Wang_2023. We provide comparative visualizations for daytime, cloudy, and nighttime conditions, along with visual results for perception, prediction, and planning.