Table of Contents
Fetching ...

Context-Aware Model-Based Reinforcement Learning for Autonomous Racing

Emran Yasser Moustafa, Ivana Dusparic

TL;DR

The paper addresses robust generalization of model-based RL for autonomous driving in a competitive, non-stationary setting. It introduces cMask, a context-aware extension of DreamerV3 that uses a SAC-generated mask to selectively apply episode context $c=[c_v,c_\theta]$ to a cRSSM world model, enabling better handling of changing adversary behaviors in Roboracer. Empirical results show context-aware methods, especially cMask, achieve safer policies and superior generalization to out-of-distribution contexts compared to context-free baselines like DreamerV3 and SAC. The work demonstrates the practical value of context masking for safety-critical, multi-agent robotics and outlines directions for handling fixed, observable contexts and extending to more flexible context representations.

Abstract

Autonomous vehicles have shown promising potential to be a groundbreaking technology for improving the safety of road users. For these vehicles, as well as many other safety-critical robotic technologies, to be deployed in real-world applications, we require algorithms that can generalize well to unseen scenarios and data. Model-based reinforcement learning algorithms (MBRL) have demonstrated state-of-the-art performance and data efficiency across a diverse set of domains. However, these algorithms have also shown susceptibility to changes in the environment and its transition dynamics. In this work, we explore the performance and generalization capabilities of MBRL algorithms for autonomous driving, specifically in the simulated autonomous racing environment, Roboracer (formerly F1Tenth). We frame the head-to-head racing task as a learning problem using contextual Markov decision processes and parameterize the driving behavior of the adversaries using the context of the episode, thereby also parameterizing the transition and reward dynamics. We benchmark the behavior of MBRL algorithms in this environment and propose a novel context-aware extension of the existing literature, cMask. We demonstrate that context-aware MBRL algorithms generalize better to out-of-distribution adversary behaviors relative to context-free approaches. We also demonstrate that cMask displays strong generalization capabilities, as well as further performance improvement relative to other context-aware MBRL approaches when racing against adversaries with in-distribution behaviors.

Context-Aware Model-Based Reinforcement Learning for Autonomous Racing

TL;DR

The paper addresses robust generalization of model-based RL for autonomous driving in a competitive, non-stationary setting. It introduces cMask, a context-aware extension of DreamerV3 that uses a SAC-generated mask to selectively apply episode context to a cRSSM world model, enabling better handling of changing adversary behaviors in Roboracer. Empirical results show context-aware methods, especially cMask, achieve safer policies and superior generalization to out-of-distribution contexts compared to context-free baselines like DreamerV3 and SAC. The work demonstrates the practical value of context masking for safety-critical, multi-agent robotics and outlines directions for handling fixed, observable contexts and extending to more flexible context representations.

Abstract

Autonomous vehicles have shown promising potential to be a groundbreaking technology for improving the safety of road users. For these vehicles, as well as many other safety-critical robotic technologies, to be deployed in real-world applications, we require algorithms that can generalize well to unseen scenarios and data. Model-based reinforcement learning algorithms (MBRL) have demonstrated state-of-the-art performance and data efficiency across a diverse set of domains. However, these algorithms have also shown susceptibility to changes in the environment and its transition dynamics. In this work, we explore the performance and generalization capabilities of MBRL algorithms for autonomous driving, specifically in the simulated autonomous racing environment, Roboracer (formerly F1Tenth). We frame the head-to-head racing task as a learning problem using contextual Markov decision processes and parameterize the driving behavior of the adversaries using the context of the episode, thereby also parameterizing the transition and reward dynamics. We benchmark the behavior of MBRL algorithms in this environment and propose a novel context-aware extension of the existing literature, cMask. We demonstrate that context-aware MBRL algorithms generalize better to out-of-distribution adversary behaviors relative to context-free approaches. We also demonstrate that cMask displays strong generalization capabilities, as well as further performance improvement relative to other context-aware MBRL approaches when racing against adversaries with in-distribution behaviors.

Paper Structure

This paper contains 21 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Velocity profiles of adversaries in ESP track subject to different velocity contexts. All adversaries in this figure have a steering context $c_\theta = +0.0$. The orange, green and blue lines correspond to the velocity profiles of adversaries with a velocity context equal to $c_v = -0.3$, $c_v= +0.0$ and $c_v = +0.3$, respectively.
  • Figure 2: Race lines of adversaries in ESP track subject to different steering contexts. All adversaries in this figure have a velocity context $c_v = +0.0$. Race lines colored in orange, green and blue correspond to the routes taken by adversaries with a steering context equal to $c_\theta = -0.3$, $c_\theta = +0.0$ and $c_\theta = +0.3$, respectively.
  • Figure 3: Track layouts of the ESP (left) and GBR (right).