Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections
Zengqi Peng, Xiao Zhou, Lei Zheng, Yubin Wang, Jun Ma
TL;DR
This work tackles safe, efficient interaction-aware autonomous driving at unsignalized intersections under uncertainty in surrounding vehicles' intentions and counts. It introduces Reward-Driven Automated Curriculum Proximal Policy Optimization (RD-ACPPO), which uses an Exp3-inspired multi-armed bandit to automatically sequence curricula with increasing numbers of SVs and integrates this with a PPO-based policy update guided by a carefully designed composite reward. Key contributions include the automated curriculum selection mechanism with target MAB synchronization, a PPO-based training loop operating over dynamically chosen curricula, and extensive validation in Highway_Env and CARLA demonstrating superior task success, robustness to initialization, and adaptability to diverse traffic configurations. The approach improves sample efficiency and safety in complex interaction-heavy driving scenarios, offering a practical pathway to transferable autonomous driving policies for unsignalized intersections.
Abstract
In this work, we present a reward-driven automated curriculum reinforcement learning approach for interaction-aware self-driving at unsignalized intersections, taking into account the uncertainties associated with surrounding vehicles (SVs). These uncertainties encompass the uncertainty of SVs' driving intention and also the quantity of SVs. To deal with this problem, the curriculum set is specifically designed to accommodate a progressively increasing number of SVs. By implementing an automated curriculum selection mechanism, the importance weights are rationally allocated across various curricula, thereby facilitating improved sample efficiency and training outcomes. Furthermore, the reward function is meticulously designed to guide the agent towards effective policy exploration. Thus the proposed framework could proactively address the above uncertainties at unsignalized intersections by employing the automated curriculum learning technique that progressively increases task difficulty, and this ensures safe self-driving through effective interaction with SVs. Comparative experiments are conducted in $Highway\_Env$, and the results indicate that our approach achieves the highest task success rate, attains strong robustness to initialization parameters of the curriculum selection module, and exhibits superior adaptability to diverse situational configurations at unsignalized intersections. Furthermore, the effectiveness of the proposed method is validated using the high-fidelity CARLA simulator.
