Table of Contents
Fetching ...

FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback

Kangan Qian, Ziang Luo, Sicong Jiang, Zilin Huang, Jinyu Miao, Zhikun Ma, Tianze Zhu, Jiayin Li, Yangfan He, Zheng Fu, Yining Shi, Boyue Wang, Hezhe Lin, Ziyu Chen, Jiangbo Yu, Xinyu Jiao, Mengmeng Yang, Kun Jiang, Diange Yang

TL;DR

FASIONAD addresses the gap between fast, reliable end-to-end planning and slow, high-level reasoning by introducing a dual-system autonomous driving framework. It uses adaptive uncertainty switching, an information bottleneck, and high-level action guidance to enable targeted VLM feedback that improves planning while preserving real-time performance. Planning-oriented QA and reward-guided VLM tuning, together with visual BEV prompts, enhance interpretability and reliability in complex scenarios. Empirical results across nuScenes, Town05 Short, and Bench2Drive demonstrate improved trajectory accuracy and reduced collision rates, with strong explainability and notable computational efficiency gains.

Abstract

Ensuring safe, comfortable, and efficient planning is crucial for autonomous driving systems. While end-to-end models trained on large datasets perform well in standard driving scenarios, they struggle with complex low-frequency events. Recent Large Language Models (LLMs) and Vision Language Models (VLMs) advancements offer enhanced reasoning but suffer from computational inefficiency. Inspired by the dual-process cognitive model "Thinking, Fast and Slow", we propose $\textbf{FASIONAD}$ -- a novel dual-system framework that synergizes a fast end-to-end planner with a VLM-based reasoning module. The fast system leverages end-to-end learning to achieve real-time trajectory generation in common scenarios, while the slow system activates through uncertainty estimation to perform contextual analysis and complex scenario resolution. Our architecture introduces three key innovations: (1) A dynamic switching mechanism enabling slow system intervention based on real-time uncertainty assessment; (2) An information bottleneck with high-level plan feedback that optimizes the slow system's guidance capability; (3) A bidirectional knowledge exchange where visual prompts enhance the slow system's reasoning while its feedback refines the fast planner's decision-making. To strengthen VLM reasoning, we develop a question-answering mechanism coupled with reward-instruct training strategy. In open-loop experiments, FASIONAD achieves a $6.7\%$ reduction in average $L2$ trajectory error and $28.1\%$ lower collision rate.

FASIONAD++ : Integrating High-Level Instruction and Information Bottleneck in FAt-Slow fusION Systems for Enhanced Safety in Autonomous Driving with Adaptive Feedback

TL;DR

FASIONAD addresses the gap between fast, reliable end-to-end planning and slow, high-level reasoning by introducing a dual-system autonomous driving framework. It uses adaptive uncertainty switching, an information bottleneck, and high-level action guidance to enable targeted VLM feedback that improves planning while preserving real-time performance. Planning-oriented QA and reward-guided VLM tuning, together with visual BEV prompts, enhance interpretability and reliability in complex scenarios. Empirical results across nuScenes, Town05 Short, and Bench2Drive demonstrate improved trajectory accuracy and reduced collision rates, with strong explainability and notable computational efficiency gains.

Abstract

Ensuring safe, comfortable, and efficient planning is crucial for autonomous driving systems. While end-to-end models trained on large datasets perform well in standard driving scenarios, they struggle with complex low-frequency events. Recent Large Language Models (LLMs) and Vision Language Models (VLMs) advancements offer enhanced reasoning but suffer from computational inefficiency. Inspired by the dual-process cognitive model "Thinking, Fast and Slow", we propose -- a novel dual-system framework that synergizes a fast end-to-end planner with a VLM-based reasoning module. The fast system leverages end-to-end learning to achieve real-time trajectory generation in common scenarios, while the slow system activates through uncertainty estimation to perform contextual analysis and complex scenario resolution. Our architecture introduces three key innovations: (1) A dynamic switching mechanism enabling slow system intervention based on real-time uncertainty assessment; (2) An information bottleneck with high-level plan feedback that optimizes the slow system's guidance capability; (3) A bidirectional knowledge exchange where visual prompts enhance the slow system's reasoning while its feedback refines the fast planner's decision-making. To strengthen VLM reasoning, we develop a question-answering mechanism coupled with reward-instruct training strategy. In open-loop experiments, FASIONAD achieves a reduction in average trajectory error and lower collision rate.

Paper Structure

This paper contains 22 sections, 6 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The motivation of our FASIONAD. Conventional E2E methods struggle with interpretability and generalization. LLMs-based methods face slow decision-making, spatial positioning issues, and potential hallucinations. We compares different motion planning methods for autonomous driving, showcasing our method’s ability to adaptive, context-aware decisions, offering better explanation and feedback.
  • Figure 2: The framework operates through dual-system: fast and slow. The fast ststem encodes image information into instance tokens(E, B, M, A relatively denotes ego tokens, BEV tokens, map tokens and agent tokens), generating multi-modal trajectories via a planning head. A reward model selects the optimal trajectory, while uncertainty estimation determines slow system activation. When engaged, the slow system utilizes VLM feedback, which is integrated both as HA and as scene-derived planning state vectors by IB, enabling trajectory refinement through the planning head.
  • Figure 3: The adaptive feedback mechanism integrates dual inputs - visual prompts and BEV prompts - into a VLM. This VLM produces three outputs: scene descriptions, detailed analyses, and high-level plans, along with planning state vectors that encapsulate scene conditions. High-level plans are embedded into ego tokens, whereas planning state vectors pass through an IB to refine environment information in query tokens.
  • Figure 4: Example scenarios demonstrating FASIONAD's adaptive feedback framework in various driving environments. Each scene shows different navigation challenges, including obstacles, lane adjustments, and turns. The proposed system provides suggested driving operations and ensures safe, smooth trajectories with minimal abrupt maneuvers, enhancing safety in complex situations.