Table of Contents
Fetching ...

Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

Guoliang You, Xiaomeng Chu, Yifan Duan, Wenyu Zhang, Xingchen Li, Sha Zhang, Yao Li, Jianmin Ji, Yanyong Zhang

TL;DR

Perception Helps Planning (PHP) is proposed, a novel framework that reconciles lane-level planning with perception and ensures that planning is inherently aligned with traffic constraints, facilitating safe and efficient driving.

Abstract

When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into the planning task. To this end, we propose Perception Helps Planning (PHP), a novel framework that reconciles lane-level planning with perception. This integration ensures that planning is inherently aligned with traffic constraints, thus facilitating safe and efficient driving. Specifically, PHP focuses on both edges of a lane for planning and perception purposes, taking into consideration the 3D positions of both lane edges and attributes for lane intersections, lane directions, lane occupancy, and planning. In the algorithmic design, the process begins with the transformer encoding multi-camera images to extract the above features and predicting lane-level perception results. Next, the hierarchical feature early fusion module refines the features for predicting planning attributes. Finally, the double-edge interpreter utilizes a late-fusion process specifically designed to integrate lane-level perception and planning information, culminating in the generation of vehicle control signals. Experiments on three Carla benchmarks show significant improvements in driving score of 27.20%, 33.47%, and 15.54% over existing algorithms, respectively, achieving the state-of-the-art performance, with the system operating up to 22.57 FPS.

Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

TL;DR

Perception Helps Planning (PHP) is proposed, a novel framework that reconciles lane-level planning with perception and ensures that planning is inherently aligned with traffic constraints, facilitating safe and efficient driving.

Abstract

When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into the planning task. To this end, we propose Perception Helps Planning (PHP), a novel framework that reconciles lane-level planning with perception. This integration ensures that planning is inherently aligned with traffic constraints, thus facilitating safe and efficient driving. Specifically, PHP focuses on both edges of a lane for planning and perception purposes, taking into consideration the 3D positions of both lane edges and attributes for lane intersections, lane directions, lane occupancy, and planning. In the algorithmic design, the process begins with the transformer encoding multi-camera images to extract the above features and predicting lane-level perception results. Next, the hierarchical feature early fusion module refines the features for predicting planning attributes. Finally, the double-edge interpreter utilizes a late-fusion process specifically designed to integrate lane-level perception and planning information, culminating in the generation of vehicle control signals. Experiments on three Carla benchmarks show significant improvements in driving score of 27.20%, 33.47%, and 15.54% over existing algorithms, respectively, achieving the state-of-the-art performance, with the system operating up to 22.57 FPS.
Paper Structure (16 sections, 17 equations, 5 figures, 4 tables)

This paper contains 16 sections, 17 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison of Autonomous Driving Framework: a) Traditional end-to-end framework prioritizes planning policy optimization without considering perception. b) Sequential integration framework enhances planning by incorporating perception into traditional end-to-end planning but lacks interaction between perception and planning. c) Our Perception Helps Planning (PHP) framework transforms path planning as a lane-level task, integrating multi-level lane-centric perception at both the feature and result levels.
  • Figure 2: The PHP begins with an image encoder that uses a ResNet to extract features $\mathbf{f}$ from multi-camera inputs $\mathbf{I}$, which are sequenced with positional encoding to form input $\mathbf{v}$ for the transformer. (a) The transformer processes input $\mathbf{v}$, extracting lane features $\mathbf{f_{double-edge}}$ and predicting BEV points $\mathit{point^{j}}$ . Simultaneously, it extract features of intersection $\mathbf{f_{int}}$, direction $\mathbf{f_{dir}}$, and occupancy $\mathbf{f_{occ}}$ through dedicated branches, and utilizes these predicted attributes $\mathit{int^i}$, $\mathit{dir^i}$, and $\mathit{occ^j}$. (b) The fusion module integrates $\mathbf{f_{int}}$ and $\mathbf{f_{dir}}$ into a probability matrix $\mathbf{f_{int2dir}}$, which, when merged with $\mathbf{f_{occ}}$, forms a comprehensive lane feature $\mathbf{f_{fusion}}$. This feature, combined with $\mathbf{f_{double-edge}}$, generates a planning feature $\mathbf{f_{plan}}$ using a learnable parameter $\mathit{gamma}$. (c) The target-guided planning branch enhances the interaction between the target point $\mathit{T_p}$ and features $\mathbf{f_{plan}}$ through attention mechanisms for predicting planning attributes $\mathit{plan^i}$. (d) Finally, the interpreter fuses and transforms the perception and planning information at the resulting level into a path, incorporating traffic signals and speed to generate a trajectory for control commands. The symbol $\oplus$ represents point-wise addition, while $\otimes$ denotes matrix multiplication.
  • Figure 3: Visualizes double-edge in a traffic scenario ${L}$, with blue and green detailing lane-level attributes for intersections and directions. orange and yellow detail point-level attributes, marking unoccupied and planning lanes.
  • Figure 4: (a) Each query $\mathbf{q_{double-edge}}$ consists of pair of edge query, $\mathbf{q_{edge}}$, each edge query comprising a set of query points, $\mathbf{q_{pts}}$. (b) A scenario features $N_d$ such $\mathbf{q_{edge}}$ pairs, and each lane within a $\mathbf{q_{edge}}$ contains $\frac{N_p}{2}$ query points.
  • Figure 5: Visualization of PHP includes multi-camera images capturing views from the front, back, left, and right, along with double-edges that detail intersection lanes, direction lanes, occupancy lanes, and selected lanes for planning. Within the occupancy lane visualization, lanes occupied by vehicles are marked with orange dashed rectangles. Planning lanes are highlighted in yellow on the front camera, enhancing the visibility of the intended lane for the vehicle.