LLM4AD: Large Language Models for Autonomous Driving -- Concept, Review, Benchmark, Experiments, and Future Trends
Can Cui, Yunsheng Ma, Sung-Yeon Park, Zichong Yang, Yupeng Zhou, Juanwu Lu, Juntong Peng, Jiaru Zhang, Ruqi Zhang, Lingxi Li, Yaobin Chen, Jitesh H. Panchal, Amr Abdelraouf, Rohit Gupta, Kyungtae Han, Ziran Wang
TL;DR
The paper defines LLM4AD, integrating large language models as the decision-making brain in autonomous driving while preserving safety-critical low-latency control by delegating real-time tasks to conventional planners. It introduces a modular framework featuring Human I/F, system messages, situation descriptors, a memory module, and an Executor, producing executable policies P and reasoning R from inputs I,F,S,C,H. To advance evaluation, it presents open benchmarks LaMPilot-Bench, CARLA Leaderboard 1.0, and NuPlanQA, and reports extensive real-world validation with Talk2Drive (cloud) and ViLaD (on-board VLM) showing personalization benefits and real-time constraints. The work further outlines ViLaD, a Vision-Language Diffusion model enabling parallel generation and bidirectional reasoning, along with on-board deployments and memory-assisted adaptation, while candidly addressing latency, safety, privacy, and trust challenges. Collectively, the work lays a path toward practical, personalized, human-centric autonomous driving with standardized benchmarks and future diffusion-based paradigms like ViLaD to overcome current limitations and enable safer, more adaptable systems.
Abstract
With the broader adoption and highly successful development of Large Language Models (LLMs), there has been growing interest and demand for applying LLMs to autonomous driving technology. Driven by their natural language understanding and reasoning capabilities, LLMs have the potential to enhance various aspects of autonomous driving systems, from perception and scene understanding to interactive decision-making. In this paper, we first introduce the novel concept of designing Large Language Models for Autonomous Driving (LLM4AD), followed by a review of existing LLM4AD studies. Then, we propose a comprehensive benchmark for evaluating the instruction-following and reasoning abilities of LLM4AD systems, which includes LaMPilot-Bench, CARLA Leaderboard 1.0 Benchmark in simulation and NuPlanQA for multi-view visual question answering. Furthermore, we conduct extensive real-world experiments on autonomous vehicle platforms, examining both on-cloud and on-edge LLM deployment for personalized decision-making and motion control. Next, we explore the future trends of integrating language diffusion models into autonomous driving, exemplified by the proposed ViLaD (Vision-Language Diffusion) framework. Finally, we discuss the main challenges of LLM4AD, including latency, deployment, security and privacy, safety, trust and transparency, and personalization.
