Table of Contents
Fetching ...

CALMM-Drive: Confidence-Aware Autonomous Driving with Large Multimodal Model

Ruoyu Yao, Yubin Wang, Haichao Liu, Rui Yang, Zengqi Peng, Lei Zhu, Jun Ma

TL;DR

This work tackles decision-planning misalignment and uncertainty in large multimodal model (LMM) driven autonomous driving by introducing CALMM-Drive, which couples driving-task oriented Chain-of-Thought reasoning with Top-K confidence elicitation to generate multiple high-level decisions with confidences. A diffusion-based trajectory generator with a hierarchical refinement pipeline translates these decisions into feasible, smooth trajectories, and a confidence-aware selector balances decision confidence with planning quality through a joint objective framework. The method defines $J^k = (J^k_f)^{\omega_f} \cdot (J_g)^{\omega_g}$ and scores trajectories via $S((\bm{X}^k)^*) = (c_t^k)^{\omega_c} \cdot \tilde{J}^k((\bm{X}^k)^*)$, enabling robust selection across $K$ candidates. Evaluations on nuPlan closed-loop benchmarks under non-reactive and reactive settings demonstrate strong long-tail performance and reduced severe driving errors compared to state-of-the-art diffusion-based and LMM-empowered planners, indicating significant practical impact for safer, more reliable autonomous driving systems.

Abstract

Decision-making and motion planning constitute critical components for ensuring the safety and efficiency of autonomous vehicles (AVs). Existing methodologies typically adopt two paradigms: decision then planning or generation then scoring. However, the former architecture often suffers from decision-planning misalignment that incurs risky situations. Meanwhile, the latter struggles to balance short-term operational metrics (e.g., immediate motion smoothness) with long-term tactical goals (e.g., route efficiency), resulting in myopic or overly conservative behaviors. To address these issues, we introduce CALMM-Drive, a novel Confidence-Aware Large Multimodal Model (LMM) empowered Autonomous Driving framework. Our approach integrates driving task-oriented Chain-of-Thought (CoT) reasoning coupled with Top-K confidence elicitation, which facilitates high-level reasoning to generate multiple candidate decisions with their confidence levels. Furthermore, we propose a novel planning module that integrates a diffusion model for trajectory generation and a hierarchical refinement process to find the optimal trajectory. This framework enables the selection over trajectory candidates accounting for both low-level solution quality and high-level tactical confidence, which avoids the risks within one-shot decisions and overcomes the limitations in short-sighted scoring mechanisms. Comprehensive evaluations in nuPlan closed-loop simulation environments demonstrate the competitive performance of CALMM-Drive across both common and long-tail benchmarks, showcasing a significant advancement in the integration of uncertainty in LMM-empowered AVs. The code will be released upon acceptance.

CALMM-Drive: Confidence-Aware Autonomous Driving with Large Multimodal Model

TL;DR

This work tackles decision-planning misalignment and uncertainty in large multimodal model (LMM) driven autonomous driving by introducing CALMM-Drive, which couples driving-task oriented Chain-of-Thought reasoning with Top-K confidence elicitation to generate multiple high-level decisions with confidences. A diffusion-based trajectory generator with a hierarchical refinement pipeline translates these decisions into feasible, smooth trajectories, and a confidence-aware selector balances decision confidence with planning quality through a joint objective framework. The method defines and scores trajectories via , enabling robust selection across candidates. Evaluations on nuPlan closed-loop benchmarks under non-reactive and reactive settings demonstrate strong long-tail performance and reduced severe driving errors compared to state-of-the-art diffusion-based and LMM-empowered planners, indicating significant practical impact for safer, more reliable autonomous driving systems.

Abstract

Decision-making and motion planning constitute critical components for ensuring the safety and efficiency of autonomous vehicles (AVs). Existing methodologies typically adopt two paradigms: decision then planning or generation then scoring. However, the former architecture often suffers from decision-planning misalignment that incurs risky situations. Meanwhile, the latter struggles to balance short-term operational metrics (e.g., immediate motion smoothness) with long-term tactical goals (e.g., route efficiency), resulting in myopic or overly conservative behaviors. To address these issues, we introduce CALMM-Drive, a novel Confidence-Aware Large Multimodal Model (LMM) empowered Autonomous Driving framework. Our approach integrates driving task-oriented Chain-of-Thought (CoT) reasoning coupled with Top-K confidence elicitation, which facilitates high-level reasoning to generate multiple candidate decisions with their confidence levels. Furthermore, we propose a novel planning module that integrates a diffusion model for trajectory generation and a hierarchical refinement process to find the optimal trajectory. This framework enables the selection over trajectory candidates accounting for both low-level solution quality and high-level tactical confidence, which avoids the risks within one-shot decisions and overcomes the limitations in short-sighted scoring mechanisms. Comprehensive evaluations in nuPlan closed-loop simulation environments demonstrate the competitive performance of CALMM-Drive across both common and long-tail benchmarks, showcasing a significant advancement in the integration of uncertainty in LMM-empowered AVs. The code will be released upon acceptance.

Paper Structure

This paper contains 20 sections, 8 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: An illustration of decision-making and motion planning paradigms. In the shown instance: (a). The downstream planner fails to produce a trajectory that closely aligns with the high-level decision of accelerating to the rightest lane. The trajectory deviation poses a collision risk with the pedestrians. (b). The scoring process prioritizes an overly conservative trajectory by failing to account for the long-term efficiency beyond the planning horizon (i.e., avoiding waiting for all the pedestrians to pass). (c). Our approach explicitly incorporates multimodal decisions with confidence levels, enabling decision-guided trajectory generations and confidence-aware scoring and selection, which allows finding the best plan balancing decision-making confidence and motion planning quality.
  • Figure 2: The pipeline of CALMM-Drive. It consists of a decision-making module based on an LMM and a motion planning module integrating the process of decision-guided trajectory generation and hierarchical refinement. A BEV image and textual description are sent to the LMM for scenario comprehension and Top-K confident decision reasoning. The motion planning objectives of candidate decisions are then created, guiding the generation and refinement of decision-conformed trajectory proposals via gradient-free diffusion-based optimization. The optimal proposal for each decision is sent to a confidence-aware trajectory selector to determine the best plan.
  • Figure 3: An illustration of different objects presented on the BEV. Annotation specifications are given by the system message.
  • Figure 4: An example of forking paths in the generation process of LMMs. Owing to the stochasticity in an intermediate reasoning step, the final response made by the LMM can be different in multiple dialogs with the same question.
  • Figure 5: A comparison between our approach and Diffusion-ES in two representative driving scenarios. At the critical steps, compared to the Diffusion-ES planner, the LMM-empowered system enables more flexible driving behaviors and complies with common sense in real-world driving. Higher confidences are assigned to favorable tactical decisions, which prevents serious driving errors caused by rule-based scoring functions.
  • ...and 2 more figures