Table of Contents
Fetching ...

ConvBKI: Real-Time Probabilistic Semantic Mapping Network with Quantifiable Uncertainty

Joey Wilson, Yuewei Fu, Joshua Friesen, Parker Ewen, Andrew Capodieci, Paramsothy Jayakumar, Kira Barton, Maani Ghaffari

TL;DR

A modular neural network for real-time (>10 Hz) semantic mapping in uncertain environments, which explicitly updates per-voxel probabilistic distributions within a neural network layer by leveraging conjugate priors is developed.

Abstract

In this paper, we develop a modular neural network for real-time {\color{black}(> 10 Hz)} semantic mapping in uncertain environments, which explicitly updates per-voxel probabilistic distributions within a neural network layer. Our approach combines the reliability of classical probabilistic algorithms with the performance and efficiency of modern neural networks. Although robotic perception is often divided between modern differentiable methods and classical explicit methods, a union of both is necessary for real-time and trustworthy performance. We introduce a novel Convolutional Bayesian Kernel Inference (ConvBKI) layer which incorporates semantic segmentation predictions online into a 3D map through a depthwise convolution layer by leveraging conjugate priors. We compare ConvBKI against state-of-the-art deep learning approaches and probabilistic algorithms for mapping to evaluate reliability and performance. We also create a Robot Operating System (ROS) package of ConvBKI and test it on real-world perceptually challenging off-road driving data.

ConvBKI: Real-Time Probabilistic Semantic Mapping Network with Quantifiable Uncertainty

TL;DR

A modular neural network for real-time (>10 Hz) semantic mapping in uncertain environments, which explicitly updates per-voxel probabilistic distributions within a neural network layer by leveraging conjugate priors is developed.

Abstract

In this paper, we develop a modular neural network for real-time {\color{black}(> 10 Hz)} semantic mapping in uncertain environments, which explicitly updates per-voxel probabilistic distributions within a neural network layer. Our approach combines the reliability of classical probabilistic algorithms with the performance and efficiency of modern neural networks. Although robotic perception is often divided between modern differentiable methods and classical explicit methods, a union of both is necessary for real-time and trustworthy performance. We introduce a novel Convolutional Bayesian Kernel Inference (ConvBKI) layer which incorporates semantic segmentation predictions online into a 3D map through a depthwise convolution layer by leveraging conjugate priors. We compare ConvBKI against state-of-the-art deep learning approaches and probabilistic algorithms for mapping to evaluate reliability and performance. We also create a Robot Operating System (ROS) package of ConvBKI and test it on real-world perceptually challenging off-road driving data.
Paper Structure (21 sections, 19 equations, 17 figures, 6 tables)

This paper contains 21 sections, 19 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Example output of our method (ConvBKI) on Semantic KITTI sequence 0 SemanticKITTI. ConvBKI recurrently updates and maintains a semantic map with uncertainty in real-time by reframing the Bayesian update step as a depthwise separable convolution operation. Voxels encode the expected semantic category and variance as concentration parameters of the Dirichlet distribution, which is the conjugate prior of the categorical distribution and multinomial distribution. The input to the network is a 3D point cloud from a stereo-camera or LiDAR sensor. The points are labeled with semantic classification probabilities by a semantic segmentation network which provides per-point predictions. ConvBKI updates the semantic belief of each voxel by spatially weighting the input semantic point clouds through a unique geometric kernel learned for each semantic category. Fig. \ref{['fig:KITTI_Semantics']} shows the maximum likelihood semantic prediction per voxel, while Fig. \ref{['fig:KITTI_Variance']} shows the variance per voxel calculated with Eq. \ref{['eq:Variance']}. Variance is linearly translated to RGB colors, where blue indicates voxels with a low variance and red indicates a high variance.
  • Figure 2: Sparse Kernel Function. $\kappa(d)$ has a maximum value of 1 at $d=0$, and decays to 0 by $d=l$. When applied to semantic mapping, points proximal to the voxel centroid have more influence over the semantic label of the voxel than points further from the centroid.
  • Figure 3: Structural overview of ConvBKI. Input 3D points are labeled with semantic segmentation predictions by a semantic segmentation neural network, and are voxelized into a dense voxel grid ($\textbf{F}$) where each voxel contains the summation over the semantic segmentation predictions of all points within the voxel. A 3D depthwise separable convolution performs the Bayesian update in real-time, where the posterior semantic voxel grid is equal to the convolution of $\textbf{F}$ plus the prior voxels from the last time-step.
  • Figure 4: Illustration of compound kernel motivation. ConvBKI learns a distribution to geometrically associate points with voxels. Whereas a point (red) labeled pole suggests a vertically adjacent voxel (blue) may also be a pole, it does not imply the same for a horizontally adjacent voxel. The image on the right demonstrates a slice of the learned weights for the class pole, which matches the expectations shown on the left.
  • Figure 5: Toy 2D illustration of local mapping method. When the ego vehicle moves from time $t-1$ to a new location at time $t$, the ego-vehicle position is approximated to the nearest voxel centroid. The local map $L_{t-1}$ shifts to the new map center and discards voxels outside of $L_t$. New voxels are initialized to prior, represented by the question mark. Since the center of map is discretized, an offset represented by $s_t$ in red is maintained to transform input point clouds to the map frame. This method prioritizes local semantic scene understanding over loop closure, as voxels will be initialized to prior when re-visited.
  • ...and 12 more figures