Table of Contents
Fetching ...

Scenario-Based Curriculum Generation for Multi-Agent Autonomous Driving

Axel Brunnbauer, Luigi Berducci, Peter Priller, Dejan Nickovic, Radu Grosu

TL;DR

This work tackles the challenge of generating diverse, realistic, and progressively challenging training scenarios for multi-agent autonomous driving in CARLA. It introduces MATS-Gym, a framework that unifies scenario specification with multi-agent RL training and auto-curriculum generation by leveraging Scenic and Unsupervised Environment Design principles. The approach employs a dual-curriculum design, combining a generator $\tilde{\pi}$ with a replay buffer and a Maximum Monte Carlo regret estimator to adapt scenario difficulty, evaluated via PPO training on bird's-eye view observations. Key findings show that action-space design profoundly influences learning dynamics and safety, and that adaptive curriculum methods can rapidly tailor scenario distributions to agent capabilities, offering practical gains for robust autonomous driving policies.

Abstract

The automated generation of diverse and complex training scenarios has been an important ingredient in many complex learning tasks. Especially in real-world application domains, such as autonomous driving, auto-curriculum generation is considered vital for obtaining robust and general policies. However, crafting traffic scenarios with multiple, heterogeneous agents is typically considered as a tedious and time-consuming task, especially in more complex simulation environments. In our work, we introduce MATS-Gym, a Multi-Agent Traffic Scenario framework to train agents in CARLA, a high-fidelity driving simulator. MATS-Gym is a multi-agent training framework for autonomous driving that uses partial scenario specifications to generate traffic scenarios with variable numbers of agents. This paper unifies various existing approaches to traffic scenario description into a single training framework and demonstrates how it can be integrated with techniques from unsupervised environment design to automate the generation of adaptive auto-curricula. The code is available at https://github.com/AutonomousDrivingExaminer/mats-gym.

Scenario-Based Curriculum Generation for Multi-Agent Autonomous Driving

TL;DR

This work tackles the challenge of generating diverse, realistic, and progressively challenging training scenarios for multi-agent autonomous driving in CARLA. It introduces MATS-Gym, a framework that unifies scenario specification with multi-agent RL training and auto-curriculum generation by leveraging Scenic and Unsupervised Environment Design principles. The approach employs a dual-curriculum design, combining a generator with a replay buffer and a Maximum Monte Carlo regret estimator to adapt scenario difficulty, evaluated via PPO training on bird's-eye view observations. Key findings show that action-space design profoundly influences learning dynamics and safety, and that adaptive curriculum methods can rapidly tailor scenario distributions to agent capabilities, offering practical gains for robust autonomous driving policies.

Abstract

The automated generation of diverse and complex training scenarios has been an important ingredient in many complex learning tasks. Especially in real-world application domains, such as autonomous driving, auto-curriculum generation is considered vital for obtaining robust and general policies. However, crafting traffic scenarios with multiple, heterogeneous agents is typically considered as a tedious and time-consuming task, especially in more complex simulation environments. In our work, we introduce MATS-Gym, a Multi-Agent Traffic Scenario framework to train agents in CARLA, a high-fidelity driving simulator. MATS-Gym is a multi-agent training framework for autonomous driving that uses partial scenario specifications to generate traffic scenarios with variable numbers of agents. This paper unifies various existing approaches to traffic scenario description into a single training framework and demonstrates how it can be integrated with techniques from unsupervised environment design to automate the generation of adaptive auto-curricula. The code is available at https://github.com/AutonomousDrivingExaminer/mats-gym.
Paper Structure (11 sections, 6 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 6 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: This multi-agent scenario illustrates an intersection where five vehicles navigate according to assigned routes, with three pedestrians observed on the sidewalk adjacent to the ego vehicle. A visual representation of the simulation is depicted in the center, while the scenario description from which the simulation parameters are sampled is described on the right. On the left, we depict other scenarios sampled from the same Scenic description.
  • Figure 2: Learning curves for I-PPO under different action definitions and the impact on episodic return, collisions and route completion. Performance reports mean and standard deviation over 5 consecutive policy updates of the same run.
  • Figure 3: Learning curves of average episodic return, route completion during training and evaluation over 3 seeds with one standard deviation. We also report the average regret of the level buffers of PLR and our DCD approach over timesteps.
  • Figure 4: Evolution of the parameter distribution during the training with DCD (top) and domain randomization (bottom). The plot shows the fractions of route types, mean number of NPCs, the fraction of agents that do not keep safety distances and ignore traffic lights and the mean target speed. For each parameter, we report mean and standard deviation over the batch of training data and indicate the direction of major difficulty ($\uparrow$). We observe that DCD steers the parameter distribution towards configurations that are supposedly easier to solve: straight crossings, fewer and safer NPCs, etc. This leads us to the conclusion that DCD adapts the training distribution faster to the performance level of the agent.
  • Figure 5: For each pair of maneuver type and number of NPCs, we average the regrets of all scenarios with the same parameters in the buffer at four checkpoints throughout the training. We observe that the environment generation policy of DCD leads to more narrowly concentrated regret on fewer configurations, suggesting faster adaption of the scenario sampling.