Table of Contents
Fetching ...

Vision based driving agent for race car simulation environments

Gergely Bári, László Palkovics

TL;DR

The paper tackles time-optimal driving at tire grip limit using a vision-based DRL approach in a TORCS simulation, aiming to match professional lap times with only pixel observations. It adopts an end-to-end CNN agent trained with Proximal Policy Optimization (PPO) to map visual state to an action vector $a_t \in [-1,1]^2$ representing throttle/brake and steering, operating at 20 Hz. A dense reward combines a time-difference signal, termination penalties, and an action-penalty, with extensive training across parallel TORCS instances reaching ~${1.5 \cdot 10^9}$ steps and revealing emergent grip-limit driving patterns such as racing-line optimization and antilock-like behavior without wheel-speed data. The results demonstrate that DRL can learn professional-level, grip-limit driving from vision alone, with implications for both autonomous racing and potential transfers to road-car control under uncertain tire grip; future work includes independent wheel control and integration of ABS/TC systems for further realism and robustness.

Abstract

In recent years, autonomous driving has become a popular field of study. As control at tire grip limit is essential during emergency situations, algorithms developed for racecars are useful for road cars too. This paper examines the use of Deep Reinforcement Learning (DRL) to solve the problem of grip limit driving in a simulated environment. Proximal Policy Optimization (PPO) method is used to train an agent to control the steering wheel and pedals of the vehicle, using only visual inputs to achieve professional human lap times. The paper outlines the formulation of the task of time optimal driving on a race track as a deep reinforcement learning problem, and explains the chosen observations, actions, and reward functions. The results demonstrate human-like learning and driving behavior that utilize maximum tire grip potential.

Vision based driving agent for race car simulation environments

TL;DR

The paper tackles time-optimal driving at tire grip limit using a vision-based DRL approach in a TORCS simulation, aiming to match professional lap times with only pixel observations. It adopts an end-to-end CNN agent trained with Proximal Policy Optimization (PPO) to map visual state to an action vector representing throttle/brake and steering, operating at 20 Hz. A dense reward combines a time-difference signal, termination penalties, and an action-penalty, with extensive training across parallel TORCS instances reaching ~ steps and revealing emergent grip-limit driving patterns such as racing-line optimization and antilock-like behavior without wheel-speed data. The results demonstrate that DRL can learn professional-level, grip-limit driving from vision alone, with implications for both autonomous racing and potential transfers to road-car control under uncertain tire grip; future work includes independent wheel control and integration of ABS/TC systems for further realism and robustness.

Abstract

In recent years, autonomous driving has become a popular field of study. As control at tire grip limit is essential during emergency situations, algorithms developed for racecars are useful for road cars too. This paper examines the use of Deep Reinforcement Learning (DRL) to solve the problem of grip limit driving in a simulated environment. Proximal Policy Optimization (PPO) method is used to train an agent to control the steering wheel and pedals of the vehicle, using only visual inputs to achieve professional human lap times. The paper outlines the formulation of the task of time optimal driving on a race track as a deep reinforcement learning problem, and explains the chosen observations, actions, and reward functions. The results demonstrate human-like learning and driving behavior that utilize maximum tire grip potential.

Paper Structure

This paper contains 6 sections, 3 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Scheme of the so called Markov Decision Process, formalizing the Agent - Environment interaction in Reinforcement Learning problems sutton_reinforcement_2018
  • Figure 2: An episode showing that the Agent learns breaking and takes the hairpin, but fails to turn left in the following (first left) corner
  • Figure 3: An episode, showing Agent learned to drive the full lap for the fist time
  • Figure 4: Lap time evolution during learning
  • Figure 5: A plot showing human pro-like driving behaviour.
  • ...and 1 more figures