MAPPO-PIS: A Multi-Agent Proximal Policy Optimization Method with Prior Intent Sharing for CAVs' Cooperative Decision-Making
Yicheng Guo, Jiaqi Liu, Rongjie Yu, Peng Hang, Jian Sun
TL;DR
MAPPO-PIS addresses cooperative decision-making for CAVs in merging areas under human–machine mixed traffic by extending MAPPO with an Intention Generator Module (IGM) that generates multi-step future trajectories and a Safety Enhanced Module (SEM) that detects and corrects unsafe intents. Integrated within a centralized training and distributed execution MARL framework, MAPPO-PIS demonstrates improved safety and efficiency over baselines in diverse traffic densities and heterogeneous vehicle settings, aided by curriculum learning and ablation validation. Key findings show reduced collision rates, higher average speeds, and more stable learning curves, with macro analyses indicating delayed bottleneck breakdown and faster recovery in merging flows. The work highlights explicit intent sharing combined with safety-aware corrections as a practical approach to enhance real-world CAV merging performance.
Abstract
Vehicle-to-Vehicle (V2V) technologies have great potential for enhancing traffic flow efficiency and safety. However, cooperative decision-making in multi-agent systems, particularly in complex human-machine mixed merging areas, remains challenging for connected and autonomous vehicles (CAVs). Intent sharing, a key aspect of human coordination, may offer an effective solution to these decision-making problems, but its application in CAVs is under-explored. This paper presents an intent-sharing-based cooperative method, the Multi-Agent Proximal Policy Optimization with Prior Intent Sharing (MAPPO-PIS), which models the CAV cooperative decision-making problem as a Multi-Agent Reinforcement Learning (MARL) problem. It involves training and updating the agents' policies through the integration of two key modules: the Intention Generator Module (IGM) and the Safety Enhanced Module (SEM). The IGM is specifically crafted to generate and disseminate CAVs' intended trajectories spanning multiple future time-steps. On the other hand, the SEM serves a crucial role in assessing the safety of the decisions made and rectifying them if necessary. Merging area with human-machine mixed traffic flow is selected to validate our method. Results show that MAPPO-PIS significantly improves decision-making performance in multi-agent systems, surpassing state-of-the-art baselines in safety, efficiency, and overall traffic system performance. The code and video demo can be found at: \url{https://github.com/CCCC1dhcgd/A-MAPPO-PIS}.
