Table of Contents
Fetching ...

Joint Localization and Planning using Diffusion

L. Lao Beyer, S. Karaman

TL;DR

This work introduces a diffusion model which produces collision-free paths in a global reference frame given an egocentric LIDAR scan, an arbitrary map, and a desired goal position and describes how to condition the denoising process on both obstacles and sensor observations.

Abstract

Diffusion models have been successfully applied to robotics problems such as manipulation and vehicle path planning. In this work, we explore their application to end-to-end navigation -- including both perception and planning -- by considering the problem of jointly performing global localization and path planning in known but arbitrary 2D environments. In particular, we introduce a diffusion model which produces collision-free paths in a global reference frame given an egocentric LIDAR scan, an arbitrary map, and a desired goal position. To this end, we implement diffusion in the space of paths in SE(2), and describe how to condition the denoising process on both obstacles and sensor observations. In our evaluation, we show that the proposed conditioning techniques enable generalization to realistic maps of considerably different appearance than the training environment, demonstrate our model's ability to accurately describe ambiguous solutions, and run extensive simulation experiments showcasing our model's use as a real-time, end-to-end localization and planning stack.

Joint Localization and Planning using Diffusion

TL;DR

This work introduces a diffusion model which produces collision-free paths in a global reference frame given an egocentric LIDAR scan, an arbitrary map, and a desired goal position and describes how to condition the denoising process on both obstacles and sensor observations.

Abstract

Diffusion models have been successfully applied to robotics problems such as manipulation and vehicle path planning. In this work, we explore their application to end-to-end navigation -- including both perception and planning -- by considering the problem of jointly performing global localization and path planning in known but arbitrary 2D environments. In particular, we introduce a diffusion model which produces collision-free paths in a global reference frame given an egocentric LIDAR scan, an arbitrary map, and a desired goal position. To this end, we implement diffusion in the space of paths in SE(2), and describe how to condition the denoising process on both obstacles and sensor observations. In our evaluation, we show that the proposed conditioning techniques enable generalization to realistic maps of considerably different appearance than the training environment, demonstrate our model's ability to accurately describe ambiguous solutions, and run extensive simulation experiments showcasing our model's use as a real-time, end-to-end localization and planning stack.
Paper Structure (15 sections, 7 equations, 9 figures, 1 table)

This paper contains 15 sections, 7 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Proposed model. A denoising diffusion process is conditioned on an obstacle map, a LIDAR scan, and a goal pose, producing a collision free path in the global map frame.
  • Figure 2: Local conditioning strategy based on sampling of the encoded obstacle map $G_\theta(\mathcal{E})$ shown for two different noise levels. Samples of $G_\theta(\mathcal{E})$ are appended to the corresponding pose and fed into the denoising network.
  • Figure 3: Obstacle map encoding using U-Net encoder. Top row shows test environments with obstacles in blue. Bottom row visualizes the corresponding encoded obstacle feature maps by mapping the first three principal components of each feature onto the RGB channels. Feature maps contain structure reminiscent of a Voronoi decomposition and also appear to encode distance to obstacles.
  • Figure 4: Sensor observation conditioning for global localization. Given the (noisy) start pose $T_0^{(t)}$ and LIDAR observation $\mathcal{O}$, we calculate the termination position of each ray to determine the location at which to sample the localization feature map $H_\theta(\mathcal{E})$. The concatenation of the sampled features serves as conditioning for the denoising network.
  • Figure 5: Random example scenarios produced by dataset generation procedure. Obstacles shown in blue, expert trajectory produced by B-spline optimization shown in orange, and LIDAR scan shown in red.
  • ...and 4 more figures