Table of Contents
Fetching ...

Cognitive-Hierarchy Guided End-to-End Planning for Autonomous Driving

Zhennan Wang, Jianing Teng, Canqun Xiang, Kangliang Chen, Xing Pan, Lu Deng, Weihao Gu

TL;DR

CogAD addresses the gap between end-to-end autonomous driving and human cognitive processes by introducing a cognitively inspired, hierarchical framework. It combines hierarchical perception (global BEV context followed by instance-level refinement) with hierarchical planning (intent-driven high-level decisions followed by trajectory-level generation), underpinned by dual uncertainty modeling through online trajectory anchors and shared motion mode embeddings. Cross-task instance interactions and BEV adapters fuse scene-wide context with object-level details, enabling diverse yet plausible multi-modal trajectories. Empirically, CogAD achieves state-of-the-art results on nuScenes and Bench2Drive, with strong generalization to long-tail and complex real-world scenarios, while maintaining efficiency and not relying on ego state or history inputs.

Abstract

While end-to-end autonomous driving has advanced significantly, prevailing methods remain fundamentally misaligned with human cognitive principles in both perception and planning. In this paper, we propose CogAD, a novel end-to-end autonomous driving model that emulates the hierarchical cognition mechanisms of human drivers. CogAD implements dual hierarchical mechanisms: global-to-local context processing for human-like perception and intent-conditioned multi-mode trajectory generation for cognitively-inspired planning. The proposed method demonstrates three principal advantages: comprehensive environmental understanding through hierarchical perception, robust planning exploration enabled by multi-level planning, and diverse yet reasonable multi-modal trajectory generation facilitated by dual-level uncertainty modeling. Extensive experiments on nuScenes and Bench2Drive demonstrate that CogAD achieves state-of-the-art performance in end-to-end planning, exhibiting particular superiority in long-tail scenarios and robust generalization to complex real-world driving conditions.

Cognitive-Hierarchy Guided End-to-End Planning for Autonomous Driving

TL;DR

CogAD addresses the gap between end-to-end autonomous driving and human cognitive processes by introducing a cognitively inspired, hierarchical framework. It combines hierarchical perception (global BEV context followed by instance-level refinement) with hierarchical planning (intent-driven high-level decisions followed by trajectory-level generation), underpinned by dual uncertainty modeling through online trajectory anchors and shared motion mode embeddings. Cross-task instance interactions and BEV adapters fuse scene-wide context with object-level details, enabling diverse yet plausible multi-modal trajectories. Empirically, CogAD achieves state-of-the-art results on nuScenes and Bench2Drive, with strong generalization to long-tail and complex real-world scenarios, while maintaining efficiency and not relying on ego state or history inputs.

Abstract

While end-to-end autonomous driving has advanced significantly, prevailing methods remain fundamentally misaligned with human cognitive principles in both perception and planning. In this paper, we propose CogAD, a novel end-to-end autonomous driving model that emulates the hierarchical cognition mechanisms of human drivers. CogAD implements dual hierarchical mechanisms: global-to-local context processing for human-like perception and intent-conditioned multi-mode trajectory generation for cognitively-inspired planning. The proposed method demonstrates three principal advantages: comprehensive environmental understanding through hierarchical perception, robust planning exploration enabled by multi-level planning, and diverse yet reasonable multi-modal trajectory generation facilitated by dual-level uncertainty modeling. Extensive experiments on nuScenes and Bench2Drive demonstrate that CogAD achieves state-of-the-art performance in end-to-end planning, exhibiting particular superiority in long-tail scenarios and robust generalization to complex real-world driving conditions.

Paper Structure

This paper contains 30 sections, 6 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Human drivers scan surrounds (c) before focusing on key objects (d), and plan hierarchically from intent to trajectory (b).
  • Figure 2: The overall framework of CogAD. CogAD extracts BEV features into task-specific queries, then performs cross-task instance feature interaction, forming a hierarchical perception paradigm. Meanwhile, CogAD implements intent-level planning and subsequently conducts trajectory-level planning, establishing a hierarchical planning mechanism.
  • Figure 3: Intent-level and Trajectory-level uncertainty modeling.
  • Figure 4: Qualitative results of CogAD on nuScenes. The 2nd column shows the Top-3 multi-mode trajectories of the highest-probability intent, with colors of red, orange, and yellow, respectively. The 3rd column displays the Top-10 multi-intent trajectories with the highest-probability mode. GT trajectory is drawn in green.
  • Figure 5: Qualitative results of CogAD on Bench2Drive.
  • ...and 5 more figures