Table of Contents
Fetching ...

Learning to Drive in a Day

Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, Amar Shah

TL;DR

The paper demonstrates the first application of deep reinforcement learning to autonomous driving by framing lane-following as an MDP and solving it with on-vehicle training using a monocular image input. It uses Deep Deterministic Policy Gradients with a simple two-dimensional continuous action space and a sparse reward based on distance traveled before driver intervention, validated in both a Unity-like simulation and a real Renault Twizy. A task-based, on-vehicle training architecture and a VAE-based state representation are explored, with the VAE improving data efficiency in real-world experiments. The work shows RL can learn to drive with minimal supervision and no reliance on pre-defined maps, while identifying critical future directions in reward design, representation learning, and domain transfer for scaling to broader autonomous driving tasks.

Abstract

We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks.

Learning to Drive in a Day

TL;DR

The paper demonstrates the first application of deep reinforcement learning to autonomous driving by framing lane-following as an MDP and solving it with on-vehicle training using a monocular image input. It uses Deep Deterministic Policy Gradients with a simple two-dimensional continuous action space and a sparse reward based on distance traveled before driver intervention, validated in both a Unity-like simulation and a real Renault Twizy. A task-based, on-vehicle training architecture and a VAE-based state representation are explored, with the VAE improving data efficiency in real-world experiments. The work shows RL can learn to drive with minimal supervision and no reliance on pre-defined maps, while identifying critical future directions in reward design, representation learning, and domain transfer for scaling to broader autonomous driving tasks.

Abstract

We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks.

Paper Structure

This paper contains 16 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: We design a deep reinforcement learning algorithm for autonomous driving. This figure illustrates the actor-critic algorithm which we use to learn a policy and value function for driving. Our agent maximises the reward of distance travelled before intervention by a safety driver. A video of our vehicle learning to drive is available at https://wayve.ai/blog/l2diad
  • Figure 2: Outline of the workflow and the architecture for efficiently training the algorithm from a safety driver's feedback.
  • Figure 3: Examples of different road environments randomly generated for each episode in our lane following simulator. We use procedural generation to randomly vary road texture, lane markings and road topology each episode. We train using a forward facing driver-view image as input.
  • Figure 4: Using a VAE with DDPG greatly improves data efficiency in training over DDPG from raw pixels, suggesting that state representation is an important consideration for applying reinforcement learning on real systems. The 250m driving route used for our experiments is shown on the right.