Table of Contents
Fetching ...

Design of Reward Function on Reinforcement Learning for Automated Driving

Takeru Goto, Yuki Kizumi, Shun Iwasaki

TL;DR

This paper addresses the gap in reinforcement learning for automated driving where the policy must be guided by the quality of the driving process rather than solely reaching a destination. The authors introduce a general design scheme for process-oriented rewards, where multiple evaluation items are mapped to [0,1] and multiplied to form the temporary reward, with terminal-state adjustments to ensure progress toward goals. They validate the approach by applying it to circuit driving and highway cruising in a TORCS-based simulation, showing learned behaviors such as optimized cornering and prudent lane changes. The work offers a tunable framework for incorporating safety, comfort, and traffic-awareness into RL for automated driving, with future directions toward risk-sensitive RL and inverse reinforcement learning to blend hand-crafted and learned rewards.

Abstract

This paper proposes a design scheme of reward function that constantly evaluates both driving states and actions for applying reinforcement learning to automated driving. In the field of reinforcement learning, reward functions often evaluate whether the goal is achieved by assigning values such as +1 for success and -1 for failure. This type of reward function can potentially obtain a policy that achieves the goal, but the process by which the goal is reached is not evaluated. However, process to reach a destination is important for automated driving, such as keeping velocity, avoiding risk, retaining distance from other cars, keeping comfortable for passengers. Therefore, the reward function designed by the proposed scheme is suited for automated driving by evaluating driving process. The effects of the proposed scheme are demonstrated on simulated circuit driving and highway cruising. Asynchronous Advantage Actor-Critic is used, and models are trained under some situations for generalization. The result shows that appropriate driving positions are obtained, such as traveling on the inside of corners, and rapid deceleration to turn along sharp curves. In highway cruising, the ego vehicle becomes able to change lane in an environment where there are other vehicles with suitable deceleration to avoid catching up to a front vehicle, and acceleration so that a rear vehicle does not catch up to the ego vehicle.

Design of Reward Function on Reinforcement Learning for Automated Driving

TL;DR

This paper addresses the gap in reinforcement learning for automated driving where the policy must be guided by the quality of the driving process rather than solely reaching a destination. The authors introduce a general design scheme for process-oriented rewards, where multiple evaluation items are mapped to [0,1] and multiplied to form the temporary reward, with terminal-state adjustments to ensure progress toward goals. They validate the approach by applying it to circuit driving and highway cruising in a TORCS-based simulation, showing learned behaviors such as optimized cornering and prudent lane changes. The work offers a tunable framework for incorporating safety, comfort, and traffic-awareness into RL for automated driving, with future directions toward risk-sensitive RL and inverse reinforcement learning to blend hand-crafted and learned rewards.

Abstract

This paper proposes a design scheme of reward function that constantly evaluates both driving states and actions for applying reinforcement learning to automated driving. In the field of reinforcement learning, reward functions often evaluate whether the goal is achieved by assigning values such as +1 for success and -1 for failure. This type of reward function can potentially obtain a policy that achieves the goal, but the process by which the goal is reached is not evaluated. However, process to reach a destination is important for automated driving, such as keeping velocity, avoiding risk, retaining distance from other cars, keeping comfortable for passengers. Therefore, the reward function designed by the proposed scheme is suited for automated driving by evaluating driving process. The effects of the proposed scheme are demonstrated on simulated circuit driving and highway cruising. Asynchronous Advantage Actor-Critic is used, and models are trained under some situations for generalization. The result shows that appropriate driving positions are obtained, such as traveling on the inside of corners, and rapid deceleration to turn along sharp curves. In highway cruising, the ego vehicle becomes able to change lane in an environment where there are other vehicles with suitable deceleration to avoid catching up to a front vehicle, and acceleration so that a rear vehicle does not catch up to the ego vehicle.

Paper Structure

This paper contains 12 sections, 15 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Confirming effect of the equation (\ref{['eq:r']}) by calculating action values
  • Figure 2: Evaluation function for each purpose
  • Figure 3: Outline of the control scheme using the generated next objective point
  • Figure 4: Evaluation functions for circuit driving
  • Figure 5: Evaluation functions for highway cruising
  • ...and 4 more figures