Deep Reinforcement Learning for Autonomous Driving
Sen Wang, Daoyuan Jia, Xinshuo Weng
TL;DR
The paper tackles autonomous driving with reinforcement learning in continuous-action spaces using Deep Deterministic Policy Gradient (DDPG) in a TORCS simulator. It combines a 29-dimensional sensor observation with a 3-dimensional action vector (acceleration, brake, steering) and a reward function designed to maximize forward progress while maintaining center-line driving, formally described as $R_t = V_x \cos(\theta) - \alpha V_x \sin(\theta) - \gamma |trackPos| - \beta V_x |trackPos|$. The authors implement tailored actor-critic networks, replay buffers, and target networks, and evaluate learning stability across training and competitive modes. The results demonstrate that the agent can learn to drive fast and safely in simulation, providing insights into continuous-control RL for autonomous driving and informing future transfer to real-world systems.
Abstract
Reinforcement learning has steadily improved and outperform human in lots of traditional games since the resurgence of deep neural network. However, these success is not easy to be copied to autonomous driving because the state spaces in real world are extreme complex and action spaces are continuous and fine control is required. Moreover, the autonomous driving vehicles must also keep functional safety under the complex environments. To deal with these challenges, we first adopt the deep deterministic policy gradient (DDPG) algorithm, which has the capacity to handle complex state and action spaces in continuous domain. We then choose The Open Racing Car Simulator (TORCS) as our environment to avoid physical damage. Meanwhile, we select a set of appropriate sensor information from TORCS and design our own rewarder. In order to fit DDPG algorithm to TORCS, we design our network architecture for both actor and critic inside DDPG paradigm. To demonstrate the effectiveness of our model, We evaluate on different modes in TORCS and show both quantitative and qualitative results.
