Table of Contents
Fetching ...

Personalized Autonomous Driving with Large Language Models: Field Experiments

Can Cui, Zichong Yang, Yupeng Zhou, Yunsheng Ma, Juanwu Lu, Lingxi Li, Yaobin Chen, Jitesh Panchal, Ziran Wang

TL;DR

Talk2Drive addresses the need for natural-language control and long-term personalization in autonomous driving. It leverages cloud-based LLMs to translate verbal commands into executable Language Model Programs (LMPs) using real-time context and a memory module that stores past interactions for personalization. Field experiments on highway, intersection, and parking scenarios with a Lexus RX450h show reduced takeover rates and maintained safety and comfort; the memory module further lowers takeovers by up to 65.2% compared with no-memory. The work demonstrates effective interpretation of direct and indirect commands and outlines future work to reduce LLM latency, e.g., via model distillation.

Abstract

Integrating large language models (LLMs) in autonomous vehicles enables conversation with AI systems to drive the vehicle. However, it also emphasizes the requirement for such systems to comprehend commands accurately and achieve higher-level personalization to adapt to the preferences of drivers or passengers over a more extended period. In this paper, we introduce an LLM-based framework, Talk2Drive, capable of translating natural verbal commands into executable controls and learning to satisfy personal preferences for safety, efficiency, and comfort with a proposed memory module. This is the first-of-its-kind multi-scenario field experiment that deploys LLMs on a real-world autonomous vehicle. Experiments showcase that the proposed system can comprehend human intentions at different intuition levels, ranging from direct commands like "can you drive faster" to indirect commands like "I am really in a hurry now". Additionally, we use the takeover rate to quantify the trust of human drivers in the LLM-based autonomous driving system, where Talk2Drive significantly reduces the takeover rate in highway, intersection, and parking scenarios. We also validate that the proposed memory module considers personalized preferences and further reduces the takeover rate by up to 65.2% compared with those without a memory module. The experiment video can be watched at https://www.youtube.com/watch?v=4BWsfPaq1Ro

Personalized Autonomous Driving with Large Language Models: Field Experiments

TL;DR

Talk2Drive addresses the need for natural-language control and long-term personalization in autonomous driving. It leverages cloud-based LLMs to translate verbal commands into executable Language Model Programs (LMPs) using real-time context and a memory module that stores past interactions for personalization. Field experiments on highway, intersection, and parking scenarios with a Lexus RX450h show reduced takeover rates and maintained safety and comfort; the memory module further lowers takeovers by up to 65.2% compared with no-memory. The work demonstrates effective interpretation of direct and indirect commands and outlines future work to reduce LLM latency, e.g., via model distillation.

Abstract

Integrating large language models (LLMs) in autonomous vehicles enables conversation with AI systems to drive the vehicle. However, it also emphasizes the requirement for such systems to comprehend commands accurately and achieve higher-level personalization to adapt to the preferences of drivers or passengers over a more extended period. In this paper, we introduce an LLM-based framework, Talk2Drive, capable of translating natural verbal commands into executable controls and learning to satisfy personal preferences for safety, efficiency, and comfort with a proposed memory module. This is the first-of-its-kind multi-scenario field experiment that deploys LLMs on a real-world autonomous vehicle. Experiments showcase that the proposed system can comprehend human intentions at different intuition levels, ranging from direct commands like "can you drive faster" to indirect commands like "I am really in a hurry now". Additionally, we use the takeover rate to quantify the trust of human drivers in the LLM-based autonomous driving system, where Talk2Drive significantly reduces the takeover rate in highway, intersection, and parking scenarios. We also validate that the proposed memory module considers personalized preferences and further reduces the takeover rate by up to 65.2% compared with those without a memory module. The experiment video can be watched at https://www.youtube.com/watch?v=4BWsfPaq1Ro
Paper Structure (18 sections, 11 equations, 5 figures, 4 tables)

This paper contains 18 sections, 11 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Talk2Drive framework architecture. A human's spoken instructions $I$ are processed by cloud-based LLMs, which synthesize contextual data $C$ from weather, traffic conditions, local traffic rules information and the perception results from the local end. Simultaneously, the system message $S$ and the historical data $H$ are sent to LLMs. Then, the LLMs generate executable LMPs $P$ that are communicated to the vehicle's Electronic Control Unit (ECU). These LMPs operate the actuation of vehicle controls, ensuring that the human's intent is translated into safe and personalized driving actions. A memory module archives every command $I$, its resultant LMPs $P$, and subsequent user feedback $F$, ensuring continuous refinement of the personalized driving experience.
  • Figure 2: The flowchart of Talk2drive. After the speech recognition module detects the keyword 'command', the inputs ($I,C,S,H$) are sent to the LLM. Then, the LLM generates corresponding LMPs to be executed by the ECU. If the speech recognition module detects the keyword 'evaluate', the system receives human feedback ($F$), and both $F$ and its corresponding $I$ and $P$ are updated in the memory module.
  • Figure 3: Setup of the autonomous vehicle in the experiment.
  • Figure 4: The overview visualization and statistics of the test scenarios.
  • Figure 5: The experiment visualization: In the upper left corner is the in-cabin view, while the lower left corner displays the exterior view. The upper right corner shows the console, and the lower right corner presents the lidar map.