Table of Contents
Fetching ...

Path Planning based on 2D Object Bounding-box

Yanliang Huang, Liguo Zhou, Chang Liu, Alois Knoll

TL;DR

This paper tackles path planning in urban autonomous driving with a vision-centric approach that uses 2D object bounding boxes detected from multi-view cameras and HD maps. The method merges perception and planning through a dual-embedding Graph Neural Network and Transformer-based temporal-spatial aggregation to produce future waypoints. It trains perception with the $L_1$ imitation loss and planning with auxiliary safety/comfort terms, evaluated on the nuPlan planning task. Results show competitive performance against existing vision-centric methods, highlighting real-time capability and robustness in complex urban scenarios.

Abstract

The implementation of Autonomous Driving (AD) technologies within urban environments presents significant challenges. These challenges necessitate the development of advanced perception systems and motion planning algorithms capable of managing situations of considerable complexity. Although the end-to-end AD method utilizing LiDAR sensors has achieved significant success in this scenario, we argue that its drawbacks may hinder its practical application. Instead, we propose the vision-centric AD as a promising alternative offering a streamlined model without compromising performance. In this study, we present a path planning method that utilizes 2D bounding boxes of objects, developed through imitation learning in urban driving scenarios. This is achieved by integrating high-definition (HD) map data with images captured by surrounding cameras. Subsequent perception tasks involve bounding-box detection and tracking, while the planning phase employs both local embeddings via Graph Neural Network (GNN) and global embeddings via Transformer for temporal-spatial feature aggregation, ultimately producing optimal path planning information. We evaluated our model on the nuPlan planning task and observed that it performs competitively in comparison to existing vision-centric methods.

Path Planning based on 2D Object Bounding-box

TL;DR

This paper tackles path planning in urban autonomous driving with a vision-centric approach that uses 2D object bounding boxes detected from multi-view cameras and HD maps. The method merges perception and planning through a dual-embedding Graph Neural Network and Transformer-based temporal-spatial aggregation to produce future waypoints. It trains perception with the imitation loss and planning with auxiliary safety/comfort terms, evaluated on the nuPlan planning task. Results show competitive performance against existing vision-centric methods, highlighting real-time capability and robustness in complex urban scenarios.

Abstract

The implementation of Autonomous Driving (AD) technologies within urban environments presents significant challenges. These challenges necessitate the development of advanced perception systems and motion planning algorithms capable of managing situations of considerable complexity. Although the end-to-end AD method utilizing LiDAR sensors has achieved significant success in this scenario, we argue that its drawbacks may hinder its practical application. Instead, we propose the vision-centric AD as a promising alternative offering a streamlined model without compromising performance. In this study, we present a path planning method that utilizes 2D bounding boxes of objects, developed through imitation learning in urban driving scenarios. This is achieved by integrating high-definition (HD) map data with images captured by surrounding cameras. Subsequent perception tasks involve bounding-box detection and tracking, while the planning phase employs both local embeddings via Graph Neural Network (GNN) and global embeddings via Transformer for temporal-spatial feature aggregation, ultimately producing optimal path planning information. We evaluated our model on the nuPlan planning task and observed that it performs competitively in comparison to existing vision-centric methods.
Paper Structure (15 sections, 5 figures)

This paper contains 15 sections, 5 figures.

Figures (5)

  • Figure 1: An example showing a vehicle equipped with multiple cameras capturing the surrounding environment.
  • Figure 2: General framework of the proposed network, For perception, the YOLO backbone ensures fast and reliable bounding-box detection and tracking. For planning, combined with HD Map and Ego Pose, the prior knowledge is fed into a dual-embedding GNN to generate the final trajectory.
  • Figure 3: The image matrix is input into a YOLO backbone, from which bounding box coordinates, class identifications, and tracking information are generated. This process facilitates the accurate detection and classification of agents within the image, as well as their tracking over time.
  • Figure 4: The planning framework integrates data from perception systems, maps, and ego-motion sensors. This data, alongside positional encodings, is independently inputted into a fully-connected GNN. The output vectors from the GNN are then processed through a multi-head attention layer, which ultimately facilitates the generation of trajectory predictions.
  • Figure 5: Available autonomous driving datasets.