Table of Contents
Fetching ...

Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions

Shiyu Fang, Jiaqi Liu, Chengkai Xu, Chen Lv, Peng Hang, Jian Sun

TL;DR

This work tackles the challenge of real-time bidirectional AV–HV interaction by introducing a parallel Actor-Reasoner framework that leverages an LLM-driven Reasoner and a memory-based Actor to express and interpret driving intentions. The Reasoner performs Chain-of-Thought reasoning to infer HV intention and driving style while generating eHMI cues, and the Actor rapidly retrieves feasible actions from a partitioned, two-layer memory, enabling fast, context-aware decisions. Ablation studies and multi-vehicle experiments show the memory partitioning and retrieval mechanisms substantially improve safety and efficiency, with field tests confirming practical applicability in real-world traffic. The framework promises improved interpretability, adaptability to heterogeneous HVs, and scalable deployment for real-time AV-HV interactions.

Abstract

Autonomous Vehicles (AVs) have entered the commercialization stage, but their limited ability to interact and express intentions still poses challenges in interactions with Human-driven Vehicles (HVs). Recent advances in large language models (LLMs) enable bidirectional human-machine communication, but the conflict between slow inference speed and the need for real-time decision-making challenges practical deployment. To address these issues, this paper introduces a parallel Actor-Reasoner framework designed to enable explicit bidirectional AV-HV interactions across multiple scenarios. First, by facilitating interactions between the LLM-driven Reasoner and heterogeneous simulated HVs during training, an interaction memory database, referred to as the Actor, is established. Then, by introducing the memory partition module and the two-layer memory retrieval module, the Actor's ability to handle heterogeneous HVs is significantly enhanced. Ablation studies and comparisons with other decision-making methods demonstrate that the proposed Actor-Reasoner framework significantly improves safety and efficiency. Finally, with the combination of the external Human-Machine Interface (eHMI) information derived from Reasoner's reasoning and the feasible action solutions retrieved from the Actor, the effectiveness of the proposed Actor-Reasoner is confirmed in multi-scenario field interactions. Our code is available at https://github.com/FanGShiYuu/Actor-Reasoner.

Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions

TL;DR

This work tackles the challenge of real-time bidirectional AV–HV interaction by introducing a parallel Actor-Reasoner framework that leverages an LLM-driven Reasoner and a memory-based Actor to express and interpret driving intentions. The Reasoner performs Chain-of-Thought reasoning to infer HV intention and driving style while generating eHMI cues, and the Actor rapidly retrieves feasible actions from a partitioned, two-layer memory, enabling fast, context-aware decisions. Ablation studies and multi-vehicle experiments show the memory partitioning and retrieval mechanisms substantially improve safety and efficiency, with field tests confirming practical applicability in real-world traffic. The framework promises improved interpretability, adaptability to heterogeneous HVs, and scalable deployment for real-time AV-HV interactions.

Abstract

Autonomous Vehicles (AVs) have entered the commercialization stage, but their limited ability to interact and express intentions still poses challenges in interactions with Human-driven Vehicles (HVs). Recent advances in large language models (LLMs) enable bidirectional human-machine communication, but the conflict between slow inference speed and the need for real-time decision-making challenges practical deployment. To address these issues, this paper introduces a parallel Actor-Reasoner framework designed to enable explicit bidirectional AV-HV interactions across multiple scenarios. First, by facilitating interactions between the LLM-driven Reasoner and heterogeneous simulated HVs during training, an interaction memory database, referred to as the Actor, is established. Then, by introducing the memory partition module and the two-layer memory retrieval module, the Actor's ability to handle heterogeneous HVs is significantly enhanced. Ablation studies and comparisons with other decision-making methods demonstrate that the proposed Actor-Reasoner framework significantly improves safety and efficiency. Finally, with the combination of the external Human-Machine Interface (eHMI) information derived from Reasoner's reasoning and the feasible action solutions retrieved from the Actor, the effectiveness of the proposed Actor-Reasoner is confirmed in multi-scenario field interactions. Our code is available at https://github.com/FanGShiYuu/Actor-Reasoner.

Paper Structure

This paper contains 23 sections, 6 equations, 10 figures, 2 tables, 2 algorithms.

Figures (10)

  • Figure 1: Main challenges in improving the interaction and intent expression capabilities of AVs.
  • Figure 2: Overview of the proposed Actor-Reasoner architecture for driving interaction.
  • Figure 3: Illustration of the Reasoner's CoT-based reasoning process.
  • Figure 4: Ablation Study Results on Success Rates Across Various Scenarios.
  • Figure 5: Retrieval time performance of the Actor with varying numbers of stored memories.
  • ...and 5 more figures