Table of Contents
Fetching ...

FASIONAD : FAst and Slow FusION Thinking Systems for Human-Like Autonomous Driving with Adaptive Feedback

Kangan Qian, Zhikun Ma, Yangfan He, Ziang Luo, Tianyu Shi, Tianze Zhu, Jiayin Li, Jianhui Wang, Ziyu Chen, Xiao He, Yining Shi, Zheng Fu, Xinyu Jiao, Kun Jiang, Diange Yang, Takafumi Matsumaru

TL;DR

FASIONAD is presented, a novel dual-system framework inspired by the cognitive model "Thinking, Fast and Slow" that achieves state-of-the-art performance on a new benchmark derived from the nuScenes dataset, specifically designed to differentiate fast and slow scenarios.

Abstract

Ensuring safe, comfortable, and efficient navigation is a critical goal for autonomous driving systems. While end-to-end models trained on large-scale datasets excel in common driving scenarios, they often struggle with rare, long-tail events. Recent progress in large language models (LLMs) has introduced enhanced reasoning capabilities, but their computational demands pose challenges for real-time decision-making and precise planning. This paper presents FASIONAD, a novel dual-system framework inspired by the cognitive model "Thinking, Fast and Slow." The fast system handles routine navigation tasks using rapid, data-driven path planning, while the slow system focuses on complex reasoning and decision-making in challenging or unfamiliar situations. A dynamic switching mechanism based on score distribution and feedback allows seamless transitions between the two systems. Visual prompts generated by the fast system enable human-like reasoning in the slow system, which provides high-quality feedback to enhance the fast system's decision-making. To evaluate FASIONAD, we introduce a new benchmark derived from the nuScenes dataset, specifically designed to differentiate fast and slow scenarios. FASIONAD achieves state-of-the-art performance on this benchmark, establishing a new standard for frameworks integrating fast and slow cognitive processes in autonomous driving. This approach paves the way for more adaptive, human-like autonomous driving systems.

FASIONAD : FAst and Slow FusION Thinking Systems for Human-Like Autonomous Driving with Adaptive Feedback

TL;DR

FASIONAD is presented, a novel dual-system framework inspired by the cognitive model "Thinking, Fast and Slow" that achieves state-of-the-art performance on a new benchmark derived from the nuScenes dataset, specifically designed to differentiate fast and slow scenarios.

Abstract

Ensuring safe, comfortable, and efficient navigation is a critical goal for autonomous driving systems. While end-to-end models trained on large-scale datasets excel in common driving scenarios, they often struggle with rare, long-tail events. Recent progress in large language models (LLMs) has introduced enhanced reasoning capabilities, but their computational demands pose challenges for real-time decision-making and precise planning. This paper presents FASIONAD, a novel dual-system framework inspired by the cognitive model "Thinking, Fast and Slow." The fast system handles routine navigation tasks using rapid, data-driven path planning, while the slow system focuses on complex reasoning and decision-making in challenging or unfamiliar situations. A dynamic switching mechanism based on score distribution and feedback allows seamless transitions between the two systems. Visual prompts generated by the fast system enable human-like reasoning in the slow system, which provides high-quality feedback to enhance the fast system's decision-making. To evaluate FASIONAD, we introduce a new benchmark derived from the nuScenes dataset, specifically designed to differentiate fast and slow scenarios. FASIONAD achieves state-of-the-art performance on this benchmark, establishing a new standard for frameworks integrating fast and slow cognitive processes in autonomous driving. This approach paves the way for more adaptive, human-like autonomous driving systems.

Paper Structure

This paper contains 48 sections, 32 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The motivation of our FASIONAD. Conventional E2E methods struggle with interpretability and generalization. LLMs-based methods face slow decision-making, spatial positioning issues, and potential hallucinations. The dual-system pipeline tian2024drivevlm uses LLMs to fuse planning but lacks a safety feedback mechanism. We compares different motion planning methods for autonomous driving, showcasing our method’s ability to adaptive, context-aware decisions, offering better explanation and feedback.
  • Figure 2: The framework operates through dual pathways: fast pathway and slow pathway. The fast pathway encodes image information into instance tokens(E, B, M, A relatively denotes ego tokens, BEV tokens, map tokens and agent tokens), generating multiple trajectories via a planning head. A reward model selects the optimal trajectory, while uncertainty estimation determines slow pathway activation. When engaged, the slow pathway utilizes VLM feedback, which is integrated both as augmented instance token queries and as scene-derived planning state vectors, enabling trajectory refinement through the planning head.
  • Figure 3: The Adaptive Feedback mechanism processes dual inputs: trajectory-generated images and BEV prompts derived from instance tokens, both feeding into a VLM. The VLM generates three distinct outputs: scene descriptions, analyses, and high-level plans, alongside planning state vectors that capture scene conditions. High-level plans are integrated into planning reflection, which modulates ego tokens, while planning state vectors are channeled through an information bottleneck to refine instance tokens.
  • Figure 4: Example scenarios demonstrating FASIONAD's adaptive feedback framework in various driving environments. Each scene shows different navigation challenges, including obstacles, lane adjustments, and turns. The proposed system provides suggested driving operations and ensures safe, smooth trajectories with minimal abrupt maneuvers, enhancing navigation performance and safety in complex situations.
  • Figure 5: FASIONAD's fast pathway. We adopt the BEVFormer approach li2022bevformer to extract BEV features, denoted as $\mathbf{B} \in \mathbb{R}^{bs \times H \times W \times C}$, which encapsulate the environmental topology. Self-attention and cross-attention mechanisms are used to model the interactions between different tokens (agent tokens $\textbf{T}_{\textbf{A}}$, map tokens $\textbf{T}_{\textbf{M}}$, and ego tokens $\textbf{T}_{\textbf{E}}$). The generative framework, similar to that in GenAD GenAD, is employed in the trajectory generator. Finally, the reward model provides a reward for each trajectory and selects the best trajectory as the fast pathway output.
  • ...and 1 more figures