Table of Contents
Fetching ...

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

Miguel Vasco, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Peter R. Wurman, Peter Stone

TL;DR

This paper introduces the first super-human car racing agent whose sensor input is purely local to the car, namely pixels from an ego-centric camera view and quantities that can be sensed from on-board the car, such as the car's velocity.

Abstract

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Turismo. However, this agent relied on global features that require instrumentation external to the car. This paper introduces, to the best of our knowledge, the first super-human car racing agent whose sensor input is purely local to the car, namely pixels from an ego-centric camera view and quantities that can be sensed from on-board the car, such as the car's velocity. By leveraging global features only at training time, the learned agent is able to outperform the best human drivers in time trial (one car on the track at a time) races using only local input features. The resulting agent is evaluated in Gran Turismo 7 on multiple tracks and cars. Detailed ablation experiments demonstrate the agent's strong reliance on visual inputs, making it the first vision-based super-human car racing agent.

A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo

TL;DR

This paper introduces the first super-human car racing agent whose sensor input is purely local to the car, namely pixels from an ego-centric camera view and quantities that can be sensed from on-board the car, such as the car's velocity.

Abstract

Racing autonomous cars faster than the best human drivers has been a longstanding grand challenge for the fields of Artificial Intelligence and robotics. Recently, an end-to-end deep reinforcement learning agent met this challenge in a high-fidelity racing simulator, Gran Turismo. However, this agent relied on global features that require instrumentation external to the car. This paper introduces, to the best of our knowledge, the first super-human car racing agent whose sensor input is purely local to the car, namely pixels from an ego-centric camera view and quantities that can be sensed from on-board the car, such as the car's velocity. By leveraging global features only at training time, the learned agent is able to outperform the best human drivers in time trial (one car on the track at a time) races using only local input features. The resulting agent is evaluated in Gran Turismo 7 on multiple tracks and cars. Detailed ablation experiments demonstrate the agent's strong reliance on visual inputs, making it the first vision-based super-human car racing agent.
Paper Structure (31 sections, 4 equations, 39 figures, 4 tables)

This paper contains 31 sections, 4 equations, 39 figures, 4 tables.

Figures (39)

  • Figure 1: Our vision-based RL agent for autonomous car racing. (Left) We exploit an asymmetric actor-critic architecture to train our agent: the policy network $\pi_{\phi}$ is provided with propriocentric information $\mathbf{o}^p$ and image features $\mathbf{h}^i$, encoded with a convolutional neural network $q_\theta$, to output actions $\mathbf{a}$. The critic network $Q_\psi$ is provided with local propriocentric observations and global observations $\mathbf{o}^g$ (i.e., course shape information) to predict quantiles of future returns. (Right) During execution, our agent only receives local features from the Gran Turismo 7 simulator.
  • Figure 2: Examples of 64$\times$64 image observations in (left) Monza, (middle) Tokyo, and (right) Spa.
  • Figure 3: (Top) Lap time across all scenarios. We consider five randomly-seeded training runs and average the results over 500 evaluation laps, with 100 laps executed by the fastest policy in each training run. We highlight results that are significantly faster than the fastest human time (using a Wilcoxon signed-rank test, with $p<0.001$); (Bottom) Distribution of lap times in Monza (left), Tokyo (middle) and Spa (right).
  • Figure 4: Performance study of our racing agent in the Monza scenario in relation to the training architecture (left), local features (middle) and the image feature (right). We consider five randomly-seeded training runs and show the distribution of 500 evaluation laps, with 100 laps executed by the fastest policy in each training run. We highlight the lap time of the fastest human player (black line). One symmetric run failed to learn meaningful behavior and we exclude it from the analysis.
  • Figure 5: (Left) Trajectory comparison in the Monza track between our agent and the fastest human player in a chicane section. (Right) Gap of our agent to the human driver. Lower is better.
  • ...and 34 more figures