Vision based driving agent for race car simulation environments
Gergely Bári, László Palkovics
TL;DR
The paper tackles time-optimal driving at tire grip limit using a vision-based DRL approach in a TORCS simulation, aiming to match professional lap times with only pixel observations. It adopts an end-to-end CNN agent trained with Proximal Policy Optimization (PPO) to map visual state to an action vector $a_t \in [-1,1]^2$ representing throttle/brake and steering, operating at 20 Hz. A dense reward combines a time-difference signal, termination penalties, and an action-penalty, with extensive training across parallel TORCS instances reaching ~${1.5 \cdot 10^9}$ steps and revealing emergent grip-limit driving patterns such as racing-line optimization and antilock-like behavior without wheel-speed data. The results demonstrate that DRL can learn professional-level, grip-limit driving from vision alone, with implications for both autonomous racing and potential transfers to road-car control under uncertain tire grip; future work includes independent wheel control and integration of ABS/TC systems for further realism and robustness.
Abstract
In recent years, autonomous driving has become a popular field of study. As control at tire grip limit is essential during emergency situations, algorithms developed for racecars are useful for road cars too. This paper examines the use of Deep Reinforcement Learning (DRL) to solve the problem of grip limit driving in a simulated environment. Proximal Policy Optimization (PPO) method is used to train an agent to control the steering wheel and pedals of the vehicle, using only visual inputs to achieve professional human lap times. The paper outlines the formulation of the task of time optimal driving on a race track as a deep reinforcement learning problem, and explains the chosen observations, actions, and reward functions. The results demonstrate human-like learning and driving behavior that utilize maximum tire grip potential.
