Table of Contents
Fetching ...

COIN: Collaborative Interaction-Aware Multi-Agent Reinforcement Learning for Self-Driving Systems

Yifeng Zhang, Jieming Chen, Tingguang Zhou, Tanishq Duhan, Jianghong Dong, Yuhong Cao, Guillaume Sartoretti

Abstract

Multi-Agent Self-Driving (MASD) systems provide an effective solution for coordinating autonomous vehicles to reduce congestion and enhance both safety and operational efficiency in future intelligent transportation systems. Multi-Agent Reinforcement Learning (MARL) has emerged as a promising approach for developing advanced end-to-end MASD systems. However, achieving efficient and safe collaboration in dynamic MASD systems remains a significant challenge in dense scenarios with complex agent interactions. To address this challenge, we propose a novel collaborative(CO-) interaction-aware(-IN) MARL framework, named COIN. Specifically, we develop a new counterfactual individual-global twin delayed deep deterministic policy gradient (CIG-TD3) algorithm, crafted in a "centralized training, decentralized execution" (CTDE) manner, which aims to jointly optimize the individual objectives (navigation) and the global objectives (collaboration) of agents. We further introduce a dual-level interaction-aware centralized critic architecture that captures both local pairwise interactions and global system-level dependencies, enabling more accurate global value estimation and improved credit assignment for collaborative policy learning. We conduct extensive simulation experiments in dense urban traffic environments, which demonstrate that COIN consistently outperforms other advanced baseline methods in both safety and efficiency across various system sizes. These results highlight its superiority in complex and dynamic MASD scenarios, as further validated through real-world robot demonstrations. Supplementary videos are available at https://marmotlab.github.io/COIN/

COIN: Collaborative Interaction-Aware Multi-Agent Reinforcement Learning for Self-Driving Systems

Abstract

Multi-Agent Self-Driving (MASD) systems provide an effective solution for coordinating autonomous vehicles to reduce congestion and enhance both safety and operational efficiency in future intelligent transportation systems. Multi-Agent Reinforcement Learning (MARL) has emerged as a promising approach for developing advanced end-to-end MASD systems. However, achieving efficient and safe collaboration in dynamic MASD systems remains a significant challenge in dense scenarios with complex agent interactions. To address this challenge, we propose a novel collaborative(CO-) interaction-aware(-IN) MARL framework, named COIN. Specifically, we develop a new counterfactual individual-global twin delayed deep deterministic policy gradient (CIG-TD3) algorithm, crafted in a "centralized training, decentralized execution" (CTDE) manner, which aims to jointly optimize the individual objectives (navigation) and the global objectives (collaboration) of agents. We further introduce a dual-level interaction-aware centralized critic architecture that captures both local pairwise interactions and global system-level dependencies, enabling more accurate global value estimation and improved credit assignment for collaborative policy learning. We conduct extensive simulation experiments in dense urban traffic environments, which demonstrate that COIN consistently outperforms other advanced baseline methods in both safety and efficiency across various system sizes. These results highlight its superiority in complex and dynamic MASD scenarios, as further validated through real-world robot demonstrations. Supplementary videos are available at https://marmotlab.github.io/COIN/

Paper Structure

This paper contains 30 sections, 12 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: The overall COIN framework, which follows the CTDE paradigm. During training, an individual critic learns individual Q-values to guide the ego vehicle’s navigation policy, while a centralized critic learns global Q-values to facilitate collaborative policy learning. These two critics work together to achieve joint policy optimization. During execution, the actor learns an end-to-end policy based only on local observations. The right side illustrates the roles of the individual critic and centralized critic. Notably, we design a dual-level interaction-aware centralized critic with a counterfactual baseline to improve credit assignment and promote better agent cooperation.
  • Figure 2: The structure of our proposed dual-level interaction-aware centralized critic network. It includes a local interaction-aware module to capture pairwise interactions between agents, and a global interaction-aware module to dynamically model global dependencies. These two modules work together to estimate the global Q-value. In addition, a counterfactual baseline (which is not conditioned on the ego agent’s action information) is used to improve credit assignment and guide cooperative policy learning during centralized training.
  • Figure 3: Illustration of three typical highly interactive and dense traffic scenarios for MASD systems in the MetaDrive simulator: (a) roundabout (left), (b) intersection (up right), and (c) bottleneck (bottom right).
  • Figure 4: Smoothed average reward curves over training steps for COIN and baselines (e.g., ITD3 and TraCo) across three environments, where COIN consistently achieves higher rewards and faster convergence than other baselines.
  • Figure 5: Performance comparison of COIN and other baselines across success rate, efficiency, and safety in three traffic environments, with radar plots showing that COIN forms the largest enclosed area in all scenarios.
  • ...and 6 more figures