Table of Contents
Fetching ...

Reward Signal Design for Autonomous Racing

Benjamin Evans, Herman A. Engelbrecht, Hendrik W. Jordaan

TL;DR

The problem of reward signal design for robotic control in the context of local planning for autonomous racing is addressed and a novel method of rewarding the agent on its state relative to an optimal trajectory is presented.

Abstract

Reinforcement learning (RL) has shown to be a valuable tool in training neural networks for autonomous motion planning. The application of RL to a specific problem is dependent on a reward signal to quantify how good or bad a certain action is. This paper addresses the problem of reward signal design for robotic control in the context of local planning for autonomous racing. We aim to design reward signals that are able to perform well in multiple, competing, continuous metrics. Three different methodologies of position-based, velocity-based, and action-based rewards are considered and evaluated in the context of F1/10th racing. A novel method of rewarding the agent on its state relative to an optimal trajectory is presented. Agents are trained and tested in simulation and the behaviors generated by the reward signals are compared to each other on the basis of average lap time and completion rate. The results indicate that a reward based on the distance and velocity relative to a minimum curvature trajectory produces the fastest lap times.

Reward Signal Design for Autonomous Racing

TL;DR

The problem of reward signal design for robotic control in the context of local planning for autonomous racing is addressed and a novel method of rewarding the agent on its state relative to an optimal trajectory is presented.

Abstract

Reinforcement learning (RL) has shown to be a valuable tool in training neural networks for autonomous motion planning. The application of RL to a specific problem is dependent on a reward signal to quantify how good or bad a certain action is. This paper addresses the problem of reward signal design for robotic control in the context of local planning for autonomous racing. We aim to design reward signals that are able to perform well in multiple, competing, continuous metrics. Three different methodologies of position-based, velocity-based, and action-based rewards are considered and evaluated in the context of F1/10th racing. A novel method of rewarding the agent on its state relative to an optimal trajectory is presented. Agents are trained and tested in simulation and the behaviors generated by the reward signals are compared to each other on the basis of average lap time and completion rate. The results indicate that a reward based on the distance and velocity relative to a minimum curvature trajectory produces the fastest lap times.

Paper Structure

This paper contains 18 sections, 4 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Reinforcement Learning Framework: A RL agent uses a state vector to select an action that is implemented on a racing car. We ask the question of how to reward the agent based on the cars performance?
  • Figure 2: Distance Based Reward: The vehicles position is projected onto the line and used to measure the change in progress between time steps.
  • Figure 3: Cross-track & Heading Reward: Illustration of how cross-track distance, $d_\text{c}$, and heading error, $\theta$, are measured
  • Figure 4: Modification planner showing how a path follower and neural network are used in parallel to avoid obstacles while maintaining a reference trajectory.
  • Figure 5: Example scenario from simulator showing a non-holonomic vehicle on a track and the relevant range finder readings. The range finders are equally spaced in front of the vehicle and limited at a maximum range
  • ...and 5 more figures