Hierarchical Multi-Agent MCTS for Safety-Critical Coordination in Mixed-Autonomy Roundabouts
Zhihao Lin, Shuo Liu, Zhen Tian, Dezong Zhao, Jianglin Lan, Chongfeng Wei
TL;DR
This work tackles safety-critical coordination for mixed-autonomy traffic at unsignalized, dual-lane roundabouts by integrating a multi-agent Monte Carlo Tree Search with a hierarchical risk assessment. It jointly models CAV and HDV interactions as a multi-agent MDP, introduces lane-specific HDV uncertainty, and uses safety-aware pruning and a Shapley-value-based reward to balance individual and collective performance. The approach yields substantial safety and efficiency gains, reducing PET violations and trajectory deviations, especially as AV penetration increases; in fully autonomous scenarios PET violations vanish while mixed-traffic cases still achieve strong safety with high arrival rates. The framework offers a practical, interpretable planning mechanism for real-world deployment, with potential extensions in scalability and geometric generalization.
Abstract
Navigating unsignalized roundabouts in mixed-autonomy traffic presents significant challenges due to dense vehicle interactions, lane-changing complexities, and behavioral uncertainties of human-driven vehicles (HDVs). This paper proposes a safety-critical decision-making framework for connected and automated vehicles (CAVs) navigating dual-lane roundabouts alongside HDVs. We formulate the problem as a multi-agent Markov Decision Process and develop a hierarchical safety assessment mechanism that evaluates three critical interaction types: CAV-to-CAV (C2C), CAV-to-HDV (C2H), and CAV-to-Boundary (C2B). A key contribution is our lane-specific uncertainty model for HDVs, which captures distinct behavioral patterns between inner and outer lanes, with outer-lane vehicles exhibiting $2.3\times$ higher uncertainty due to less constrained movements. We integrate this safety framework with a multi-agent Monte Carlo Tree Search (MCTS) algorithm that employs safety-aware pruning to eliminate high-risk trajectories while maintaining computational efficiency. The reward function incorporates Shapley value-based credit assignment to balance individual performance with group coordination. Extensive simulation results validate the effectiveness of the proposed approach under both fully autonomous (100% AVs) and mixed traffic (50% AVs + 50% HDVs) conditions. Compared to benchmark methods, our framework consistently reduces trajectory deviations across all AVs and significantly lowers the rate of Post-Encroachment Time (PET) violations, achieving only 1.0% in the fully autonomous scenario and 3.2% in the mixed traffic setting.
