Table of Contents
Fetching ...

GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving

Yunpeng Zhang, Deheng Qian, Ding Li, Yifeng Pan, Yong Chen, Zhenbao Liang, Zhiyao Zhang, Shurui Zhang, Hongxu Li, Maolei Fu, Yun Ye, Zhujin Liang, Yi Shan, Dalong Du

TL;DR

The paper tackles safety-critical end-to-end autonomous driving by explicitly modeling heterogeneous interactions among the ego-vehicle, other road agents, and map elements using an Interaction Scene Graph (ISG).GraphAD combines a spatiotemporal BEV representation with TrackFormer and MapFormer to extract dynamic and static elements, then iteratively refines features through DSG and SSG connections based on trajectory-based geometry priors.A planning head with ego-status encoding and occupancy-based post-optimization enables end-to-end trajectory generation with improved safety, achieving state-of-the-art results on nuScenes across perception, prediction, and planning tasks.Extensive ablations demonstrate the effectiveness of dynamic/static graph interactions, the superiority of trajectory-based graph distances and MLP-based aggregation, and the value of ego-aware planning components.

Abstract

Modeling complicated interactions among the ego-vehicle, road agents, and map elements has been a crucial part for safety-critical autonomous driving. Previous works on end-to-end autonomous driving rely on the attention mechanism for handling heterogeneous interactions, which fails to capture the geometric priors and is also computationally intensive. In this paper, we propose the Interaction Scene Graph (ISG) as a unified method to model the interactions among the ego-vehicle, road agents, and map elements. With the representation of the ISG, the driving agents aggregate essential information from the most influential elements, including the road agents with potential collisions and the map elements to follow. Since a mass of unnecessary interactions are omitted, the more efficient scene-graph-based framework is able to focus on indispensable connections and leads to better performance. We evaluate the proposed method for end-to-end autonomous driving on the nuScenes dataset. Compared with strong baselines, our method significantly outperforms in the full-stack driving tasks, including perception, prediction, and planning. Code will be released at https://github.com/zhangyp15/GraphAD.

GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving

TL;DR

The paper tackles safety-critical end-to-end autonomous driving by explicitly modeling heterogeneous interactions among the ego-vehicle, other road agents, and map elements using an Interaction Scene Graph (ISG).GraphAD combines a spatiotemporal BEV representation with TrackFormer and MapFormer to extract dynamic and static elements, then iteratively refines features through DSG and SSG connections based on trajectory-based geometry priors.A planning head with ego-status encoding and occupancy-based post-optimization enables end-to-end trajectory generation with improved safety, achieving state-of-the-art results on nuScenes across perception, prediction, and planning tasks.Extensive ablations demonstrate the effectiveness of dynamic/static graph interactions, the superiority of trajectory-based graph distances and MLP-based aggregation, and the value of ego-aware planning components.

Abstract

Modeling complicated interactions among the ego-vehicle, road agents, and map elements has been a crucial part for safety-critical autonomous driving. Previous works on end-to-end autonomous driving rely on the attention mechanism for handling heterogeneous interactions, which fails to capture the geometric priors and is also computationally intensive. In this paper, we propose the Interaction Scene Graph (ISG) as a unified method to model the interactions among the ego-vehicle, road agents, and map elements. With the representation of the ISG, the driving agents aggregate essential information from the most influential elements, including the road agents with potential collisions and the map elements to follow. Since a mass of unnecessary interactions are omitted, the more efficient scene-graph-based framework is able to focus on indispensable connections and leads to better performance. We evaluate the proposed method for end-to-end autonomous driving on the nuScenes dataset. Compared with strong baselines, our method significantly outperforms in the full-stack driving tasks, including perception, prediction, and planning. Code will be released at https://github.com/zhangyp15/GraphAD.
Paper Structure (37 sections, 3 equations, 4 figures, 7 tables)

This paper contains 37 sections, 3 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The Interaction Scene Graph is composed of the Dynamic Scene Graph (DSG) and the Static Scene Graph (SSG). In DSG, the traffic agents, represented by the round nodes, pay attention to the surrounding agents by the directed connections. In SSG, the traffic agents reason about their trajectories based on the connected lanes which are represented by the rectangular nodes.
  • Figure 2: GraphAD features the graph-based interactions between the structured instances in the driving environment, including the dynamic traffic agents and the static map elements. GraphAD first constructs the spatiotemporal scene feature on the Bird-Eye-View as the unified representation for downstream tasks. Then, GraphAD extracts the structured instances by the TrackFormer and the MapFormer. Taking these instances as graph nodes, GraphAD proposes the Interaction Scene Graph to iteratively refine the features of dynamic nodes, by considering the inter-agent and agent-map interactions. Finally, the processed node features are utilized for motion prediction and end-to-end planning.
  • Figure 3: The qualitative visualization of the Dynamic Scene Graph. The agent of interest, marked by the red dot, has 6 different modalities of future trajectories. With each motion intention, this agent interacts with the most influential traffic agents, which are denoted by the connections. Faraway connections are omitted for clarity.
  • Figure 4: The qualitative visualization of the planning trajectories. The images from six cameras are shown on the left. The predicted trajectories of traffic agents and the planning result of the ego vehicle are shown on the right. The color intensities of these trajectories vary according to the probability $p$ and the time $t$. The red arrows highlight the environments which most likely influence the ego vehicle planning.