Table of Contents
Fetching ...

Learning autonomous driving from aerial imagery

Varun Murali, Guy Rosman, Sertac Karaman, Daniela Rus

TL;DR

This work uses a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle to demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data.

Abstract

In this work, we consider the problem of learning end to end perception to control for ground vehicles solely from aerial imagery. Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views.However, they have a large setup cost, require careful collection of data and often human effort to create usable simulators. We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle. These novel viewpoints can then be used for several downstream autonomous navigation applications. In this work, we demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data. In a traditional real to sim to real framework, the collected data would be transformed into a visual simulator which could then be used to generate novel views. In contrast, using a NeRF allows a compact representation and the ability to optimize over the parameters of the visual simulator as more data is gathered in the environment. We demonstrate the efficacy of our method in a custom built mini-city environment through the deployment of imitation policies on robotic cars. We additionally consider the task of place localization and demonstrate that our method is able to relocalize the car in the real world.

Learning autonomous driving from aerial imagery

TL;DR

This work uses a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle to demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data.

Abstract

In this work, we consider the problem of learning end to end perception to control for ground vehicles solely from aerial imagery. Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views.However, they have a large setup cost, require careful collection of data and often human effort to create usable simulators. We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle. These novel viewpoints can then be used for several downstream autonomous navigation applications. In this work, we demonstrate the utility of novel view synthesis though the application of training a policy for end to end learning from images and depth data. In a traditional real to sim to real framework, the collected data would be transformed into a visual simulator which could then be used to generate novel views. In contrast, using a NeRF allows a compact representation and the ability to optimize over the parameters of the visual simulator as more data is gathered in the environment. We demonstrate the efficacy of our method in a custom built mini-city environment through the deployment of imitation policies on robotic cars. We additionally consider the task of place localization and demonstrate that our method is able to relocalize the car in the real world.

Paper Structure

This paper contains 19 sections, 6 figures, 2 tables, 2 algorithms.

Figures (6)

  • Figure 1: An overview of the method presented in this work. First, we assume that we are given aerial images and their corresponding poses (A). We then reconstruct a photogrammetric model (B) of the world using BEV images and use this intermediate representation which can be used to query ground robot views for desired poses (C). We use the synthesize images to learn policies that can be directly deployed on real vehicles (D).
  • Figure 2: An overview of our proposed method. (A) We assume that are we are given images and their corresponding poses from the bird's eye view. (B) We leverage a neural render field to compactly represent the density and color of the desired scene. The NeRF can be queried for poses along the road network. (C) We then learn two task relevant to autonomous driving: (i) visual localization and (ii) end-to-end driving.
  • Figure 3: The figure shows the desired trajectories in the minicity environment. The first trajectory is shown in red, the second in blue, and the third in yellow.
  • Figure 4: Experimental setup used. The figure shows an example configuration of the houses, the road network, the motion capture system and the Parrot Bebop drone used to collect data.
  • Figure 5: Rendered image from the NeRF from the point of view of the ground robot with ground priors (top) and without(bottom).
  • ...and 1 more figures