Table of Contents
Fetching ...

AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging

Senkang Hu, Zhengru Fang, Zihan Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

TL;DR

This work addresses ramp-merging bottlenecks in connected autonomous traffic by introducing AgentsCoMerge, a large language model–empowered framework for collaborative decision-making among multiple CAVs. The approach integrates a scene understanding module (vision and text), a hierarchical planning engine, inter-agent communication, and a reinforcement-reflection training paradigm to enable coordinated ramp merging with interpretable reasoning. Key contributions include a DPOMDP-Com–based collaboration formulation, a vision-language perception stack feeding an LLM reasoning core with chain-of-thought prompts, and a two-stage training regimen that improves planning reliability and safety. Empirical results across LimSim++, nuScenes, and HighD demonstrate superior efficiency, safety, and driving performance compared with state-of-the-art baselines, highlighting the potential of LLM-driven collaborative AD in complex merging scenarios.

Abstract

Ramp merging is one of the bottlenecks in traffic systems, which commonly cause traffic congestion, accidents, and severe carbon emissions. In order to address this essential issue and enhance the safety and efficiency of connected and autonomous vehicles (CAVs) at multi-lane merging zones, we propose a novel collaborative decision-making framework, named AgentsCoMerge, to leverage large language models (LLMs). Specifically, we first design a scene observation and understanding module to allow an agent to capture the traffic environment. Then we propose a hierarchical planning module to enable the agent to make decisions and plan trajectories based on the observation and the agent's own state. In addition, in order to facilitate collaboration among multiple agents, we introduce a communication module to enable the surrounding agents to exchange necessary information and coordinate their actions. Finally, we develop a reinforcement reflection guided training paradigm to further enhance the decision-making capability of the framework. Extensive experiments are conducted to evaluate the performance of our proposed method, demonstrating its superior efficiency and effectiveness for multi-agent collaborative decision-making under various ramp merging scenarios.

AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging

TL;DR

This work addresses ramp-merging bottlenecks in connected autonomous traffic by introducing AgentsCoMerge, a large language model–empowered framework for collaborative decision-making among multiple CAVs. The approach integrates a scene understanding module (vision and text), a hierarchical planning engine, inter-agent communication, and a reinforcement-reflection training paradigm to enable coordinated ramp merging with interpretable reasoning. Key contributions include a DPOMDP-Com–based collaboration formulation, a vision-language perception stack feeding an LLM reasoning core with chain-of-thought prompts, and a two-stage training regimen that improves planning reliability and safety. Empirical results across LimSim++, nuScenes, and HighD demonstrate superior efficiency, safety, and driving performance compared with state-of-the-art baselines, highlighting the potential of LLM-driven collaborative AD in complex merging scenarios.

Abstract

Ramp merging is one of the bottlenecks in traffic systems, which commonly cause traffic congestion, accidents, and severe carbon emissions. In order to address this essential issue and enhance the safety and efficiency of connected and autonomous vehicles (CAVs) at multi-lane merging zones, we propose a novel collaborative decision-making framework, named AgentsCoMerge, to leverage large language models (LLMs). Specifically, we first design a scene observation and understanding module to allow an agent to capture the traffic environment. Then we propose a hierarchical planning module to enable the agent to make decisions and plan trajectories based on the observation and the agent's own state. In addition, in order to facilitate collaboration among multiple agents, we introduce a communication module to enable the surrounding agents to exchange necessary information and coordinate their actions. Finally, we develop a reinforcement reflection guided training paradigm to further enhance the decision-making capability of the framework. Extensive experiments are conducted to evaluate the performance of our proposed method, demonstrating its superior efficiency and effectiveness for multi-agent collaborative decision-making under various ramp merging scenarios.
Paper Structure (26 sections, 35 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 26 sections, 35 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: The multi-lane merging scenarios
  • Figure 2: The kinematic bicycle model of a CAV
  • Figure 3: Overall Architecture of AgentsCoMerge. The overall architecture of the proposed collaborative driving framework under multi-lane merging scenarios. It includes four parts: 1) scene understanding, 2) planning, 3) inter-agent communication, and 4) reinforcement reflection. First, the agent understands the scene and environments by scene observation and understanding module. Then, the agent plans and makes decisions based on the environments and messages from other agents. Finally, the agent reflects on the decisions and improves its decision-making capability with reinforcement reflection guided training.
  • Figure 4: Design of the visual-based scene understanding module.
  • Figure 5: Illustration of the collaborative area and the merging area in multi-lane merging scenario.
  • ...and 3 more figures