Table of Contents
Fetching ...

Informed Reinforcement Learning for Situation-Aware Traffic Rule Exceptions

Daniel Bogdoll, Jing Qin, Moritz Nekolla, Ahmed Abouelazm, Tim Joseph, J. Marius Zöllner

TL;DR

This work introduces Informed Reinforcement Learning, a framework that augments RL with a structured rulebook to handle traffic rule exceptions in autonomous driving. It learns trajectories in Frenet space and uses a situation-aware reward shaped by Linear Temporal Logic-based rule realizations and hierarchy coefficients, enabling dynamic prioritization of rules. Tested on a CARLA anomaly benchmark with 1,000 scenarios using DreamerV3 and Rainbow agents, the approach yields faster learning and robust performance in scenarios requiring controlled rule exceptions. The key contributions are the rulebook-integrated reward, Frenet-space trajectory generation, and a scalable, situation-aware decision mechanism that integrates real-world traffic rules into RL training and evaluation.

Abstract

Reinforcement Learning is a highly active research field with promising advancements. In the field of autonomous driving, however, often very simple scenarios are being examined. Common approaches use non-interpretable control commands as the action space and unstructured reward designs which lack structure. In this work, we introduce Informed Reinforcement Learning, where a structured rulebook is integrated as a knowledge source. We learn trajectories and asses them with a situation-aware reward design, leading to a dynamic reward which allows the agent to learn situations which require controlled traffic rule exceptions. Our method is applicable to arbitrary RL models. We successfully demonstrate high completion rates of complex scenarios with recent model-based agents.

Informed Reinforcement Learning for Situation-Aware Traffic Rule Exceptions

TL;DR

This work introduces Informed Reinforcement Learning, a framework that augments RL with a structured rulebook to handle traffic rule exceptions in autonomous driving. It learns trajectories in Frenet space and uses a situation-aware reward shaped by Linear Temporal Logic-based rule realizations and hierarchy coefficients, enabling dynamic prioritization of rules. Tested on a CARLA anomaly benchmark with 1,000 scenarios using DreamerV3 and Rainbow agents, the approach yields faster learning and robust performance in scenarios requiring controlled rule exceptions. The key contributions are the rulebook-integrated reward, Frenet-space trajectory generation, and a scalable, situation-aware decision mechanism that integrates real-world traffic rules into RL training and evaluation.

Abstract

Reinforcement Learning is a highly active research field with promising advancements. In the field of autonomous driving, however, often very simple scenarios are being examined. Common approaches use non-interpretable control commands as the action space and unstructured reward designs which lack structure. In this work, we introduce Informed Reinforcement Learning, where a structured rulebook is integrated as a knowledge source. We learn trajectories and asses them with a situation-aware reward design, leading to a dynamic reward which allows the agent to learn situations which require controlled traffic rule exceptions. Our method is applicable to arbitrary RL models. We successfully demonstrate high completion rates of complex scenarios with recent model-based agents.
Paper Structure (15 sections, 5 equations, 7 figures, 2 tables)

This paper contains 15 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Architecture of our approach. We use Curriculum Learning, where normal scenarios are used first to learn basic driving behavior. Then, anomalies are provided to learn controlled rule exceptions. Given an observation $o_t$, the Reinforcement Learning agent chooses an action $a_t$ as the parametric input for generating a trajectory $\tau_t$. The rulebook then evaluates the trajectory in the context of an abstracted environment $\hat{o}_t$ and provides the partial reward $r_{RB,t}$. Finally, a controller follows the trajectory. During evaluation, only the path in green is executed.
  • Figure 2: Graph representation of a hierarchical rulebook $\Psi$ with rule realizations $\psi_i$ and hierarchy coefficients $\rho_j$, where $j$ indicates the hierarchy index.
  • Figure 3: Traffic scenario that shows an atypical scenario with an anomaly. In the illustrated scenario, the ego vehicle's lane is blocked, enabling it to perform a controlled rule exception. Adapted from Qin_Reinforcement_2023_MA.
  • Figure 4: Dynamic action space in Frenet space. The left side shows a scenarios where the ego vehicle is in its intended lane while it is in the opposite lane on the right side.
  • Figure 5: Evaluation of the arrived distance and finished score during training, showing the running average, standard deviation, and 5th and 95th percentiles. We compared agents that worked with direct controls as their output or a trajectory and utilized either a conservative reward or our rulebook.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition III.1: Rule Realization
  • Definition III.2: Rulebook