Table of Contents
Fetching ...

Reward (Mis)design for Autonomous Driving

W. Bradley Knox, Alessandro Allievi, Holger Banzhaf, Felix Schmitt, Peter Stone

TL;DR

The paper tackles the critical but underexplored problem of reward design for autonomous driving by introducing 8 sanity checks to diagnose flaws in reward and cost functions. It systematically applies these checks to published RL-for-AD reward functions, revealing pervasive issues such as unsafe reward shaping and misaligned human preferences. The authors discuss broader design directions, including learning reward functions, multi-objective optimization, and monetizing outcomes with a financial utility, to improve reliability and alignment with stakeholder interests. The work argues that robust reward design is essential for safe, scalable deployment of RL in autonomous driving and provides a practical framework for future research and evaluation beyond RL.

Abstract

This article considers the problem of diagnosing certain common errors in reward design. Its insights are also applicable to the design of cost functions and performance metrics more generally. To diagnose common errors, we develop 8 simple sanity checks for identifying flaws in reward functions. These sanity checks are applied to reward functions from past work on reinforcement learning (RL) for autonomous driving (AD), revealing near-universal flaws in reward design for AD that might also exist pervasively across reward design for other tasks. Lastly, we explore promising directions that may aid the design of reward functions for AD in subsequent research, following a process of inquiry that can be adapted to other domains.

Reward (Mis)design for Autonomous Driving

TL;DR

The paper tackles the critical but underexplored problem of reward design for autonomous driving by introducing 8 sanity checks to diagnose flaws in reward and cost functions. It systematically applies these checks to published RL-for-AD reward functions, revealing pervasive issues such as unsafe reward shaping and misaligned human preferences. The authors discuss broader design directions, including learning reward functions, multi-objective optimization, and monetizing outcomes with a financial utility, to improve reliability and alignment with stakeholder interests. The work argues that robust reward design is essential for safe, scalable deployment of RL in autonomous driving and provides a practical framework for future research and evaluation beyond RL.

Abstract

This article considers the problem of diagnosing certain common errors in reward design. Its insights are also applicable to the design of cost functions and performance metrics more generally. To diagnose common errors, we develop 8 simple sanity checks for identifying flaws in reward functions. These sanity checks are applied to reward functions from past work on reinforcement learning (RL) for autonomous driving (AD), revealing near-universal flaws in reward design for AD that might also exist pervasively across reward design for other tasks. Lastly, we explore promising directions that may aid the design of reward functions for AD in subsequent research, following a process of inquiry that can be adapted to other domains.

Paper Structure

This paper contains 93 sections, 35 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Relationships of terminology among related fields. Note that fitness and objective are similar in meaning to return but not identical, as we describe in Section \ref{['sec:background']}.
  • Figure 2: Illustrations of the abstract trajectories used in Sections \ref{['sec:preforder']} and \ref{['sec:indifference']}.
  • Figure 3: Estimates of kilometers per collision at which various published reward functions are indifferent regarding whether they prefer safely declining to move or driving with a certain km per collision rate. Higher values indicate stronger safety requirements. Publications are referenced by the first 3 letters of their first author's name and the last two digits of their publication year. Che19 refers to the publication by Jianyu Chen. The 3 points on the right designate estimates of actual km per collision for the age group among US drivers with the most km per collisions (50--60 year olds) and the least (16--17 year olds) tefft2017rates, as well as a rough estimate of km per collision for a drunk 16--17 year old (from applying a 37x risk for blood alcohol concentration $\geq 0.08$, as estimated by peck2007improved). *The task domain of jaritz2018end was presented as a racing video game and therefore should not be judged by real-world safety standards.
  • Figure 4: Illustration of a common learnable loophole, in which the agent moves in circles to repeatedly collect reward for progress, never reaching the goal.