CALMM-Drive: Confidence-Aware Autonomous Driving with Large Multimodal Model
Ruoyu Yao, Yubin Wang, Haichao Liu, Rui Yang, Zengqi Peng, Lei Zhu, Jun Ma
TL;DR
This work tackles decision-planning misalignment and uncertainty in large multimodal model (LMM) driven autonomous driving by introducing CALMM-Drive, which couples driving-task oriented Chain-of-Thought reasoning with Top-K confidence elicitation to generate multiple high-level decisions with confidences. A diffusion-based trajectory generator with a hierarchical refinement pipeline translates these decisions into feasible, smooth trajectories, and a confidence-aware selector balances decision confidence with planning quality through a joint objective framework. The method defines $J^k = (J^k_f)^{\omega_f} \cdot (J_g)^{\omega_g}$ and scores trajectories via $S((\bm{X}^k)^*) = (c_t^k)^{\omega_c} \cdot \tilde{J}^k((\bm{X}^k)^*)$, enabling robust selection across $K$ candidates. Evaluations on nuPlan closed-loop benchmarks under non-reactive and reactive settings demonstrate strong long-tail performance and reduced severe driving errors compared to state-of-the-art diffusion-based and LMM-empowered planners, indicating significant practical impact for safer, more reliable autonomous driving systems.
Abstract
Decision-making and motion planning constitute critical components for ensuring the safety and efficiency of autonomous vehicles (AVs). Existing methodologies typically adopt two paradigms: decision then planning or generation then scoring. However, the former architecture often suffers from decision-planning misalignment that incurs risky situations. Meanwhile, the latter struggles to balance short-term operational metrics (e.g., immediate motion smoothness) with long-term tactical goals (e.g., route efficiency), resulting in myopic or overly conservative behaviors. To address these issues, we introduce CALMM-Drive, a novel Confidence-Aware Large Multimodal Model (LMM) empowered Autonomous Driving framework. Our approach integrates driving task-oriented Chain-of-Thought (CoT) reasoning coupled with Top-K confidence elicitation, which facilitates high-level reasoning to generate multiple candidate decisions with their confidence levels. Furthermore, we propose a novel planning module that integrates a diffusion model for trajectory generation and a hierarchical refinement process to find the optimal trajectory. This framework enables the selection over trajectory candidates accounting for both low-level solution quality and high-level tactical confidence, which avoids the risks within one-shot decisions and overcomes the limitations in short-sighted scoring mechanisms. Comprehensive evaluations in nuPlan closed-loop simulation environments demonstrate the competitive performance of CALMM-Drive across both common and long-tail benchmarks, showcasing a significant advancement in the integration of uncertainty in LMM-empowered AVs. The code will be released upon acceptance.
