Table of Contents
Fetching ...

UFO: Uncertainty-aware LiDAR-image Fusion for Off-road Semantic Terrain Map Estimation

Ohn Kim, Junwon Seo, Seongyong Ahn, Chong Hui Kim

TL;DR

This paper presents a learning-based fusion method for generating dense terrain classification maps in BEV by performing LiDAR-image fusion at multiple scales that can improve accuracy in off-road terrains, validating its efficacy in facilitating reliable and safe autonomous navigation in challenging off-road settings.

Abstract

Autonomous off-road navigation requires an accurate semantic understanding of the environment, often converted into a bird's-eye view (BEV) representation for various downstream tasks. While learning-based methods have shown success in generating local semantic terrain maps directly from sensor data, their efficacy in off-road environments is hindered by challenges in accurately representing uncertain terrain features. This paper presents a learning-based fusion method for generating dense terrain classification maps in BEV. By performing LiDAR-image fusion at multiple scales, our approach enhances the accuracy of semantic maps generated from an RGB image and a single-sweep LiDAR scan. Utilizing uncertainty-aware pseudo-labels further enhances the network's ability to learn reliably in off-road environments without requiring precise 3D annotations. By conducting thorough experiments using off-road driving datasets, we demonstrate that our method can improve accuracy in off-road terrains, validating its efficacy in facilitating reliable and safe autonomous navigation in challenging off-road settings.

UFO: Uncertainty-aware LiDAR-image Fusion for Off-road Semantic Terrain Map Estimation

TL;DR

This paper presents a learning-based fusion method for generating dense terrain classification maps in BEV by performing LiDAR-image fusion at multiple scales that can improve accuracy in off-road terrains, validating its efficacy in facilitating reliable and safe autonomous navigation in challenging off-road settings.

Abstract

Autonomous off-road navigation requires an accurate semantic understanding of the environment, often converted into a bird's-eye view (BEV) representation for various downstream tasks. While learning-based methods have shown success in generating local semantic terrain maps directly from sensor data, their efficacy in off-road environments is hindered by challenges in accurately representing uncertain terrain features. This paper presents a learning-based fusion method for generating dense terrain classification maps in BEV. By performing LiDAR-image fusion at multiple scales, our approach enhances the accuracy of semantic maps generated from an RGB image and a single-sweep LiDAR scan. Utilizing uncertainty-aware pseudo-labels further enhances the network's ability to learn reliably in off-road environments without requiring precise 3D annotations. By conducting thorough experiments using off-road driving datasets, we demonstrate that our method can improve accuracy in off-road terrains, validating its efficacy in facilitating reliable and safe autonomous navigation in challenging off-road settings.
Paper Structure (18 sections, 7 equations, 3 figures, 3 tables)

This paper contains 18 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of image-guided pseudo-label generation. A pre-trained $2$D image segmentation network derives semantic segmentation results from past and future images. These outcomes are aggregated using paired point clouds and then projected onto BEV grids to generate the pseudo-ground truth. Each grid determines a pseudo-label through the argmax operation, while its uncertainty is also quantified. Areas within the white box exhibit inconsistent semantic predictions across multiple timesteps, leading to higher uncertainties, depicted by brighter colors.
  • Figure 2: High-level architecture of the proposed method. The network takes input from a single-sweep LiDAR point cloud and an RGB image captured by the front camera, producing a dense semantic terrain classification map in BEV. Extracted features from the image and point cloud, obtained through distinct encoders, are fused using Multi-scale Attentive Feature Fusion, integrated into the encoder of a $3$D UNet of each modality. Subsequently, these fused features are passed to a $2$D UNet to generate the dense semantic terrain classification map in BEV.
  • Figure 3: Compared to other methods, Ours successfully predicted a semantic terrain map with the LiDAR-image fusion. The camera-only method excels at extracting semantic information from terrain but fails to represent a map with accurate geographical information in complex off-road environments. On the other hand, LiDAR-only methods accurately represent geographical information but are vulnerable to the semantic classification of terrain.