Table of Contents
Fetching ...

Diverse and Adaptive Behavior Curriculum for Autonomous Driving: A Student-Teacher Framework with Multi-Agent RL

Ahmed Abouelazm, Johannes Ratz, Philip Schörner, J. Marius Zöllner

TL;DR

The student, trained with automatic curricula, outperformed agents trained on rule-based traffic, achieving higher rewards and exhibiting balanced, assertive driving.

Abstract

Autonomous driving faces challenges in navigating complex real-world traffic, requiring safe handling of both common and critical scenarios. Reinforcement learning (RL), a prominent method in end-to-end driving, enables agents to learn through trial and error in simulation. However, RL training often relies on rule-based traffic scenarios, limiting generalization. Additionally, current scenario generation methods focus heavily on critical scenarios, neglecting a balance with routine driving behaviors. Curriculum learning, which progressively trains agents on increasingly complex tasks, is a promising approach to improving the robustness and coverage of RL driving policies. However, existing research mainly emphasizes manually designed curricula, focusing on scenery and actor placement rather than traffic behavior dynamics. This work introduces a novel student-teacher framework for automatic curriculum learning. The teacher, a graph-based multi-agent RL component, adaptively generates traffic behaviors across diverse difficulty levels. An adaptive mechanism adjusts task difficulty based on student performance, ensuring exposure to behaviors ranging from common to critical. The student, though exchangeable, is realized as a deep RL agent with partial observability, reflecting real-world perception constraints. Results demonstrate the teacher's ability to generate diverse traffic behaviors. The student, trained with automatic curricula, outperformed agents trained on rule-based traffic, achieving higher rewards and exhibiting balanced, assertive driving.

Diverse and Adaptive Behavior Curriculum for Autonomous Driving: A Student-Teacher Framework with Multi-Agent RL

TL;DR

The student, trained with automatic curricula, outperformed agents trained on rule-based traffic, achieving higher rewards and exhibiting balanced, assertive driving.

Abstract

Autonomous driving faces challenges in navigating complex real-world traffic, requiring safe handling of both common and critical scenarios. Reinforcement learning (RL), a prominent method in end-to-end driving, enables agents to learn through trial and error in simulation. However, RL training often relies on rule-based traffic scenarios, limiting generalization. Additionally, current scenario generation methods focus heavily on critical scenarios, neglecting a balance with routine driving behaviors. Curriculum learning, which progressively trains agents on increasingly complex tasks, is a promising approach to improving the robustness and coverage of RL driving policies. However, existing research mainly emphasizes manually designed curricula, focusing on scenery and actor placement rather than traffic behavior dynamics. This work introduces a novel student-teacher framework for automatic curriculum learning. The teacher, a graph-based multi-agent RL component, adaptively generates traffic behaviors across diverse difficulty levels. An adaptive mechanism adjusts task difficulty based on student performance, ensuring exposure to behaviors ranging from common to critical. The student, though exchangeable, is realized as a deep RL agent with partial observability, reflecting real-world perception constraints. Results demonstrate the teacher's ability to generate diverse traffic behaviors. The student, trained with automatic curricula, outperformed agents trained on rule-based traffic, achieving higher rewards and exhibiting balanced, assertive driving.

Paper Structure

This paper contains 19 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of the proposed behavior curriculum framework. The student (blue) and teacher (red) interact concurrently in the driving environment. The student learns a driving policy from sensor data, guided by a standard reward. The teacher dynamically orchestrates NPC behavior based on a fully observable state representation, the student's performance, and an auxiliary input $\lambda$.
  • Figure 2: The teacher network architecture first encodes the agents' history and map lane graph separately, followed by hierarchical interaction fusion, ensuring each NPC embedding incorporates information about the map topology, road layout, surrounding NPCs, and the student. The NPC embeddings are concatenated and combined with a linear projection of the auxiliary input before being processed by the actor-critic MLP, which outputs the policies and value functions for each NPC.
  • Figure 3: The proposed algorithm consists of three sequential steps. Initially, the teacher undergoes training for $N_{\text{teacher}}$ iterations to refine its NPC behavior policy. Followed by a recalibration phase to determine the initial difficulty level of the behavior curriculum. Finally, the student driving policy is trained for $N_{\text{student}}$ iterations under a curriculum with progressively increasing difficulty.
  • Figure 4: Exemplary scenarios of the teacher's behavior generation across three difficulty levels, with the NPCs highlighted in red and the student highlighted in blue. Screenshots are arranged in a temporal sequence from left to right. Moving vehicles are marked with a bounding box. The black and white flag indicates the goal position of the student.
  • Figure 5: Comparison of student velocity profiles across different traffic conditions. Each plot shows two students: one trained with curriculum learning (CL) and one with rule-based traffic. The rule-based student adopts an exploitative policy, often waiting passively for NPCs to clear the intersection. Conversely, the CL student navigates traffic more assertively and maintains smoother, balanced velocity.