Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections

Zengqi Peng; Xiao Zhou; Lei Zheng; Yubin Wang; Jun Ma

Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections

Zengqi Peng, Xiao Zhou, Lei Zheng, Yubin Wang, Jun Ma

TL;DR

This work tackles safe, efficient interaction-aware autonomous driving at unsignalized intersections under uncertainty in surrounding vehicles' intentions and counts. It introduces Reward-Driven Automated Curriculum Proximal Policy Optimization (RD-ACPPO), which uses an Exp3-inspired multi-armed bandit to automatically sequence curricula with increasing numbers of SVs and integrates this with a PPO-based policy update guided by a carefully designed composite reward. Key contributions include the automated curriculum selection mechanism with target MAB synchronization, a PPO-based training loop operating over dynamically chosen curricula, and extensive validation in Highway_Env and CARLA demonstrating superior task success, robustness to initialization, and adaptability to diverse traffic configurations. The approach improves sample efficiency and safety in complex interaction-heavy driving scenarios, offering a practical pathway to transferable autonomous driving policies for unsignalized intersections.

Abstract

In this work, we present a reward-driven automated curriculum reinforcement learning approach for interaction-aware self-driving at unsignalized intersections, taking into account the uncertainties associated with surrounding vehicles (SVs). These uncertainties encompass the uncertainty of SVs' driving intention and also the quantity of SVs. To deal with this problem, the curriculum set is specifically designed to accommodate a progressively increasing number of SVs. By implementing an automated curriculum selection mechanism, the importance weights are rationally allocated across various curricula, thereby facilitating improved sample efficiency and training outcomes. Furthermore, the reward function is meticulously designed to guide the agent towards effective policy exploration. Thus the proposed framework could proactively address the above uncertainties at unsignalized intersections by employing the automated curriculum learning technique that progressively increases task difficulty, and this ensures safe self-driving through effective interaction with SVs. Comparative experiments are conducted in $Highway\_Env$, and the results indicate that our approach achieves the highest task success rate, attains strong robustness to initialization parameters of the curriculum selection module, and exhibits superior adaptability to diverse situational configurations at unsignalized intersections. Furthermore, the effectiveness of the proposed method is validated using the high-fidelity CARLA simulator.

Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections

TL;DR

Abstract

, and the results indicate that our approach achieves the highest task success rate, attains strong robustness to initialization parameters of the curriculum selection module, and exhibits superior adaptability to diverse situational configurations at unsignalized intersections. Furthermore, the effectiveness of the proposed method is validated using the high-fidelity CARLA simulator.

Paper Structure (14 sections, 19 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 19 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
Problem Definition
Problem Statement
Learning Environment
Methodology
Task Decomposition and Curriculum Modelling
Automated Curriculum Selection
Reward-Driven Automated Curriculum Proximal Policy Optimization
Experiments
Comparative Experimental Settings
Training Results
Performance Evaluation
Experimental Validation in CARLA
Conclusion

Figures (6)

Figure 1: Overview of the proposed framework for autonomous driving at unsignalized intersections with interactive SVs. In the four-way intersection scenario, the EV is depicted in red, and the SVs are in blue. The solid vehicle and the semi-transparent vehicle represent the start point and the goal point, respectively.
Figure 2: Reward curve comparison among different methods. The training curves are smoothed by the Savitzky-Golay filter.
Figure 3: Probability of arms with exponential initialization weight during the training process.
Figure 4: Probability of arms with equal initialization weight during the training process.
Figure 5: Demonstration of the driving performance attained by the proposed RD-ACPPO method in an unprotected left-turn task. The green car and blue cars represent the EV and SVs under normal driving conditions, respectively. The red cars represent the vehicles that have collided.
...and 1 more figures

Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections

TL;DR

Abstract

Reward-Driven Automated Curriculum Learning for Interaction-Aware Self-Driving at Unsignalized Intersections

Authors

TL;DR

Abstract

Table of Contents

Figures (6)