Table of Contents
Fetching ...

Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

Yuan Chen, Zi-han Ding, Ziqin Wang, Yan Wang, Lijun Zhang, Si Liu

TL;DR

AsyncDriver introduces an asynchronous LLM-enhanced closed-loop framework for autonomous driving that decouples LLM inference from real-time planners. It leverages a Scene-Associated Instruction Feature Extraction Module and an Adaptive Injection Block to fuse language-based routing instructions with vectorized scene data, guiding trajectory predictions while maintaining real-time performance. Through pretraining with Planning-QA and Reasoning1K and careful fine-tuning, the method achieves superior closed-loop results on nuPlan Hard20, with notable gains in drivable area and TTC and robust performance under asynchronous inference. The work demonstrates practical benefits of reducing LLM latency in safety-critical planning, offering a versatile, model-agnostic injection mechanism that can extend to other transformer-based planners, and highlights future directions for broader generalization and real-world deployment.

Abstract

Despite real-time planners exhibiting remarkable performance in autonomous driving, the growing exploration of Large Language Models (LLMs) has opened avenues for enhancing the interpretability and controllability of motion planning. Nevertheless, LLM-based planners continue to encounter significant challenges, including elevated resource consumption and extended inference times, which pose substantial obstacles to practical deployment. In light of these challenges, we introduce AsyncDriver, a new asynchronous LLM-enhanced closed-loop framework designed to leverage scene-associated instruction features produced by LLM to guide real-time planners in making precise and controllable trajectory predictions. On one hand, our method highlights the prowess of LLMs in comprehending and reasoning with vectorized scene data and a series of routing instructions, demonstrating its effective assistance to real-time planners. On the other hand, the proposed framework decouples the inference processes of the LLM and real-time planners. By capitalizing on the asynchronous nature of their inference frequencies, our approach have successfully reduced the computational cost introduced by LLM, while maintaining comparable performance. Experiments show that our approach achieves superior closed-loop evaluation performance on nuPlan's challenging scenarios.

Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

TL;DR

AsyncDriver introduces an asynchronous LLM-enhanced closed-loop framework for autonomous driving that decouples LLM inference from real-time planners. It leverages a Scene-Associated Instruction Feature Extraction Module and an Adaptive Injection Block to fuse language-based routing instructions with vectorized scene data, guiding trajectory predictions while maintaining real-time performance. Through pretraining with Planning-QA and Reasoning1K and careful fine-tuning, the method achieves superior closed-loop results on nuPlan Hard20, with notable gains in drivable area and TTC and robust performance under asynchronous inference. The work demonstrates practical benefits of reducing LLM latency in safety-critical planning, offering a versatile, model-agnostic injection mechanism that can extend to other transformer-based planners, and highlights future directions for broader generalization and real-world deployment.

Abstract

Despite real-time planners exhibiting remarkable performance in autonomous driving, the growing exploration of Large Language Models (LLMs) has opened avenues for enhancing the interpretability and controllability of motion planning. Nevertheless, LLM-based planners continue to encounter significant challenges, including elevated resource consumption and extended inference times, which pose substantial obstacles to practical deployment. In light of these challenges, we introduce AsyncDriver, a new asynchronous LLM-enhanced closed-loop framework designed to leverage scene-associated instruction features produced by LLM to guide real-time planners in making precise and controllable trajectory predictions. On one hand, our method highlights the prowess of LLMs in comprehending and reasoning with vectorized scene data and a series of routing instructions, demonstrating its effective assistance to real-time planners. On the other hand, the proposed framework decouples the inference processes of the LLM and real-time planners. By capitalizing on the asynchronous nature of their inference frequencies, our approach have successfully reduced the computational cost introduced by LLM, while maintaining comparable performance. Experiments show that our approach achieves superior closed-loop evaluation performance on nuPlan's challenging scenarios.
Paper Structure (37 sections, 6 equations, 12 figures, 4 tables)

This paper contains 37 sections, 6 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Comparative Overview of Learning-based Autonomous Driving Planning Frameworks. (a) Real-time planner: Offers quick inference but has limited controllability. (b) LLM-based planner: Produces linguistic descriptions and controls, offering high interactivity and interpretability at the expense of inference speed. (c) AsyncDriver: While leveraging the reasoning capabilities of LLM, a balance between performance and inference speed is achieved through asynchronous control.
  • Figure 2: Overview of our proposed AsyncDriver framework. Scene information, together with routing instructions, is encoded through the Scene-Associated Instruction Feature Extraction Module. Subsequently, the Adaptive Injection Block asynchronously enhances the features of the real-time planner, facilitating closed-loop control for autonomous vehicles. The Alignment Assistance Module is exclusively employed for multi-modality alignment during training.
  • Figure 3: Asynchronous Inference. Evaluation of expanding the inference interval to $[1,9,17,29,49,79,149]$ between the LLM and the real-time planner, executing asynchronous inference. Inference time measured on GPU Tesla A30.
  • Figure 4: Visualization of AsyncDriver following human instruction. The light blue line represents ego’s trajectory for the next $8$ seconds. It contrasts the planning trajectories of the ego receiving conventional routing instructions against a forced stop instruction.
  • Figure :
  • ...and 7 more figures