Agent-Agnostic Centralized Training for Decentralized Multi-Agent Cooperative Driving

Shengchao Yan; Lukas König; Wolfram Burgard

Agent-Agnostic Centralized Training for Decentralized Multi-Agent Cooperative Driving

Shengchao Yan, Lukas König, Wolfram Burgard

TL;DR

This work proposes an asymmetric actor-critic model that learns decentralized cooperative driving policies for autonomous vehicles using single-agent reinforcement learning, and efficiently manages real-world traffic dynamics and partial observability.

Abstract

Active traffic management with autonomous vehicles offers the potential for reduced congestion and improved traffic flow. However, developing effective algorithms for real-world scenarios requires overcoming challenges related to infinite-horizon traffic flow and partial observability. To address these issues and further decentralize traffic management, we propose an asymmetric actor-critic model that learns decentralized cooperative driving policies for autonomous vehicles using single-agent reinforcement learning. By employing attention neural networks with masking, our approach efficiently manages real-world traffic dynamics and partial observability, eliminating the need for predefined agents or agent-specific experience buffers in multi-agent reinforcement learning. Extensive evaluations across various traffic scenarios demonstrate our method's significant potential in improving traffic flow at critical bottleneck points. Moreover, we address the challenges posed by conservative autonomous vehicle driving behaviors that adhere strictly to traffic rules, showing that our cooperative policy effectively alleviates potential slowdowns without compromising safety.

Agent-Agnostic Centralized Training for Decentralized Multi-Agent Cooperative Driving

TL;DR

Abstract

Paper Structure (25 sections, 1 equation, 5 figures, 2 tables)

This paper contains 25 sections, 1 equation, 5 figures, 2 tables.

Introduction
Background and Related Work
Single-Agent and Multi-Agent Reinforcement Learning
Traffic Management with Reinforcement Learning
Safety and Cautiousness in Autonomous Driving
Methods
State, Observation, Action and Reward
State Space
Observation Space
Action Space
Reward Function
Asymmetric Actor Critic
Experiments
Experiment Setup
Traffic Episodes for Training and Evaluation
...and 10 more sections

Figures (5)

Figure 1: Common traffic bottlenecks: on-ramp merge, four-way intersection, three-way intersection, lane drop. AVs follow the learned policy only in the blue areas as described in Sec. \ref{['sec:state_space']}.
Figure 2: Vehicle $2$ intends to merge into a dense freeway. Green vehicles are AVs, while white ones are HVs. The dashed circle represents the sensing range of vehicle $1$. A gap for vehicle $2$ to merge in can be created by either lane changing of AV $1$ or slowing down of AV $3$.
Figure 3: Policy network. The network input is from the on-ramp scenario visualized in Fig. \ref{['fig:onramp']}, where two AVs out of three are activated.
Figure 4: Critic network. The input embedding layer shares the same parameters with the policy network.
Figure 5: Comparative analysis of traffic flow for different vehicle groups on the on-ramp map with a traffic input of $3500.0\text{v}/h$. Travel times ($T_\mathrm{travel}$) represent the median, lower, and upper quartiles for all vehicles successfully exiting the system across $20$ evaluative episodes. Throughput quantifies the proportion of vehicles exiting versus the total vehicles introduced during these episodes. "NC-$x$" denotes scenarios without a controller at $x\%$ AV penetration, while "DVC-$x$" refers to scenarios employing our developed decentralized policy at the corresponding penetration rate.

Agent-Agnostic Centralized Training for Decentralized Multi-Agent Cooperative Driving

TL;DR

Abstract

Agent-Agnostic Centralized Training for Decentralized Multi-Agent Cooperative Driving

Authors

TL;DR

Abstract

Table of Contents

Figures (5)