Table of Contents
Fetching ...

MEM: Multi-Modal Elevation Mapping for Robotics and Learning

Gian Erni, Jonas Frey, Takahiro Miki, Matias Mattamala, Marco Hutter

TL;DR

This work extends a 2.5D robot-centric elevation mapping framework by fusing multi-modal information from multiple sources into a popular map representation and presents a set of fusion algorithms that can be selected based on the information type and user requirements.

Abstract

Elevation maps are commonly used to represent the environment of mobile robots and are instrumental for locomotion and navigation tasks. However, pure geometric information is insufficient for many field applications that require appearance or semantic information, which limits their applicability to other platforms or domains. In this work, we extend a 2.5D robot-centric elevation mapping framework by fusing multi-modal information from multiple sources into a popular map representation. The framework allows inputting data contained in point clouds or images in a unified manner. To manage the different nature of the data, we also present a set of fusion algorithms that can be selected based on the information type and user requirements. Our system is designed to run on the GPU, making it real-time capable for various robotic and learning tasks. We demonstrate the capabilities of our framework by deploying it on multiple robots with varying sensor configurations and showcasing a range of applications that utilize multi-modal layers, including line detection, human detection, and colorization.

MEM: Multi-Modal Elevation Mapping for Robotics and Learning

TL;DR

This work extends a 2.5D robot-centric elevation mapping framework by fusing multi-modal information from multiple sources into a popular map representation and presents a set of fusion algorithms that can be selected based on the information type and user requirements.

Abstract

Elevation maps are commonly used to represent the environment of mobile robots and are instrumental for locomotion and navigation tasks. However, pure geometric information is insufficient for many field applications that require appearance or semantic information, which limits their applicability to other platforms or domains. In this work, we extend a 2.5D robot-centric elevation mapping framework by fusing multi-modal information from multiple sources into a popular map representation. The framework allows inputting data contained in point clouds or images in a unified manner. To manage the different nature of the data, we also present a set of fusion algorithms that can be selected based on the information type and user requirements. Our system is designed to run on the GPU, making it real-time capable for various robotic and learning tasks. We demonstrate the capabilities of our framework by deploying it on multiple robots with varying sensor configurations and showcasing a range of applications that utilize multi-modal layers, including line detection, human detection, and colorization.
Paper Structure (21 sections, 12 equations, 8 figures, 3 tables)

This paper contains 21 sections, 12 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Three different applications of our multi-modal elevation mapping framework. Left: PCA layer of visual features on a crosswalk using LiDAR, Depth and RGB cameras. Center: semantic segmentation layer in an agricultural field using a RGB-D sensor. Right: semantic segmentation layer in a garden using LiDAR, Depth and RGB cameras.
  • Figure 2: Overview of our multi-modal elevation map structure. The framework takes multi-modal images (purple) and multi-modal (blue) point clouds as input. This data is input into the elevation map by first associating the data to the cells and then fused with different fusion algorithms into the various layers of the map. Finally the map can be post-processed with various custom plugins to generate new layers (e.g. traversability) or process layer for external components (e.g. line detection).
  • Figure 3: Association of pixel-wise semantic information to individual cells within the map. Each cell within the frustum of the corresponding camera is projected onto the image plane. An efficient ray-casting approach checks if the cell is visible or occluded by another cell based on the cell's height.
  • Figure 4: Fusion algorithms behaviour over time. In Figure (a) the prior at $t_0$ and the measurement that falls into one cell is shown. (b) depicts the exponential decay of Class 0 confidence as a result of exponential averaging fusion. (c) illustrates the Bayesian inference fusion method, which exhibits a more gradual decrease in the Class 0 confidence.
  • Figure 5: Post-processing. This figure shows how the map can be employed to generate insight and a new layer. On the left, the input image and the feature extraction step are displayed. This data is fed into the elevation map in the middle of the figure. The top right side displays a post-processing plugin that generates the the left (red) and right (blue) tree lines predictions of a vineyard. On the lower right side a plugin takes feature layers as input and generates a PCA layer.
  • ...and 3 more figures