Table of Contents
Fetching ...

AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving

Ruifei Zhang, Junlin Xie, Wei Zhang, Weikai Chen, Xiao Tan, Xiang Wan, Guanbin Li

TL;DR

AdaDrive tackles the challenge of integrating LLMs into language-grounded autonomous driving by learning when to engage the LLM and how much influence it should exert. It introduces a slow-fast architecture with Connector-W for adaptive activation and Connector-H for dynamic fusion, supported by LS-Qformer for long-short visual modeling and a Propagative Memory Fusion memory buffer for streaming data. Training uses a comparative activation loss that links LLM usage to actual gains, achieving state-of-the-art driving scores while reducing inference cost on LangAuto benchmarks. The approach provides practical gains in robustness and efficiency, enabling real-time, context-aware LLM collaboration in autonomous driving.

Abstract

Effectively integrating Large Language Models (LLMs) into autonomous driving requires a balance between leveraging high-level reasoning and maintaining real-time efficiency. Existing approaches either activate LLMs too frequently, causing excessive computational overhead, or use fixed schedules, failing to adapt to dynamic driving conditions. To address these challenges, we propose AdaDrive, an adaptively collaborative slow-fast framework that optimally determines when and how LLMs contribute to decision-making. (1) When to activate the LLM: AdaDrive employs a novel adaptive activation loss that dynamically determines LLM invocation based on a comparative learning mechanism, ensuring activation only in complex or critical scenarios. (2) How to integrate LLM assistance: Instead of rigid binary activation, AdaDrive introduces an adaptive fusion strategy that modulates a continuous, scaled LLM influence based on scene complexity and prediction confidence, ensuring seamless collaboration with conventional planners. Through these strategies, AdaDrive provides a flexible, context-aware framework that maximizes decision accuracy without compromising real-time performance. Extensive experiments on language-grounded autonomous driving benchmarks demonstrate that AdaDrive state-of-the-art performance in terms of both driving accuracy and computational efficiency. Code is available at https://github.com/ReaFly/AdaDrive.

AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving

TL;DR

AdaDrive tackles the challenge of integrating LLMs into language-grounded autonomous driving by learning when to engage the LLM and how much influence it should exert. It introduces a slow-fast architecture with Connector-W for adaptive activation and Connector-H for dynamic fusion, supported by LS-Qformer for long-short visual modeling and a Propagative Memory Fusion memory buffer for streaming data. Training uses a comparative activation loss that links LLM usage to actual gains, achieving state-of-the-art driving scores while reducing inference cost on LangAuto benchmarks. The approach provides practical gains in robustness and efficiency, enabling real-time, context-aware LLM collaboration in autonomous driving.

Abstract

Effectively integrating Large Language Models (LLMs) into autonomous driving requires a balance between leveraging high-level reasoning and maintaining real-time efficiency. Existing approaches either activate LLMs too frequently, causing excessive computational overhead, or use fixed schedules, failing to adapt to dynamic driving conditions. To address these challenges, we propose AdaDrive, an adaptively collaborative slow-fast framework that optimally determines when and how LLMs contribute to decision-making. (1) When to activate the LLM: AdaDrive employs a novel adaptive activation loss that dynamically determines LLM invocation based on a comparative learning mechanism, ensuring activation only in complex or critical scenarios. (2) How to integrate LLM assistance: Instead of rigid binary activation, AdaDrive introduces an adaptive fusion strategy that modulates a continuous, scaled LLM influence based on scene complexity and prediction confidence, ensuring seamless collaboration with conventional planners. Through these strategies, AdaDrive provides a flexible, context-aware framework that maximizes decision accuracy without compromising real-time performance. Extensive experiments on language-grounded autonomous driving benchmarks demonstrate that AdaDrive state-of-the-art performance in terms of both driving accuracy and computational efficiency. Code is available at https://github.com/ReaFly/AdaDrive.

Paper Structure

This paper contains 32 sections, 11 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: (a) The first generation of LLM-enhanced autonomous driving approaches shao2023lmdrivezhang2024ad employ a synchronous structure, where both LLM and planner operate at each driving step. (b) Generation II methods implement asynchronous processing paradigms, utilizing distinct but predetermined activation frequencies for the LLM and planner. (c) Our proposed AdaDrive also employs an asynchronous architecture but features two novel adaptive connectors: Connector-W for adaptively determining when to activate the LLM, and Connector-H for controlling how to integrate the LLM in driving tasks. This design enables enhanced flexibility in handling uncertain or emergency situations. Besides, we also incorporate LS-Qformer for efficient processing of continuous streaming data.
  • Figure 2: An overview of AdaDrive framework, comprising generic multi-modal feature extraction and parallel slow-fast paths dedicated to logical reasoning and trajectory prediction. The two paths are adaptively integrated through our proposed Connector-W and Connector-H components, determining when to activate the LLM and how to integrate the LLM for trajectory prediction, respectively. Dashed lines indicate intermittent execution, which only occurs when LLM is enabled.
  • Figure 3: Comparisons between the Q-former and our proposed Long-Short Q-former (LS-Qformer).
  • Figure 4: Illustration of FIFO and our proposed PMF. Unlike FIFO, PMF maintains a compact buffer while enabling forward information flow by merging features from to-be-evicted frames into their preceding frames.
  • Figure 5: (a) Ablation on varying streaming memory buffer (SMB) capacities and content update mechanisms. (b) Comparison of our self-adaptive LLM activation vs. fixed-interval activation (freq. = 0, 0.1, 0.25, 0.5, and 1, where 0 indicates no activation and 1 indicates full activation) on driving scores. (c) Comparison of our self-adaptive LLM activation with fixed-interval LLM activation in terms of computational cost (GFLOPs) and driving scores. These analyses are performed using the LangAuto-Short benchmark.
  • ...and 2 more figures