Table of Contents
Fetching ...

Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor Environments

Jianhao Jiao, Ruoyu Geng, Yuanhang Li, Ren Xin, Bowen Yang, Jin Wu, Lujia Wang, Ming Liu, Rui Fan, Dimitrios Kanoulas

TL;DR

The paper tackles real-time outdoor autonomous navigation by introducing an online metric-semantic mapping system that fuses LiDAR, vision, and IMU data into a GPU-accelerated TSDF framework with semantic labeling. It combines a LiDAR-Visual-Inertial state estimator, CNN-based pixel-wise segmentation, and a Bayesian fusion mechanism to produce a global 3D mesh annotated with semantic classes, from which traversable regions are extracted for map-based localization and planning. The approach is validated on 24 sequences across public and campus datasets, showing strong geometric and semantic accuracy with rapid per-frame updates (millisecond-scale), and demonstrates real-world point-to-point navigation using the generated maps. The work advances outdoor semantic mapping by enabling real-time, large-scale, semantically informed navigation and provides public code and datasets to foster reproducibility and further research.

Abstract

The creation of a metric-semantic map, which encodes human-prior knowledge, represents a high-level abstraction of environments. However, constructing such a map poses challenges related to the fusion of multi-modal sensor data, the attainment of real-time mapping performance, and the preservation of structural and semantic information consistency. In this paper, we introduce an online metric-semantic mapping system that utilizes LiDAR-Visual-Inertial sensing to generate a global metric-semantic mesh map of large-scale outdoor environments. Leveraging GPU acceleration, our mapping process achieves exceptional speed, with frame processing taking less than 7ms, regardless of scenario scale. Furthermore, we seamlessly integrate the resultant map into a real-world navigation system, enabling metric-semantic-based terrain assessment and autonomous point-to-point navigation within a campus environment. Through extensive experiments conducted on both publicly available and self-collected datasets comprising 24 sequences, we demonstrate the effectiveness of our mapping and navigation methodologies. Code has been publicly released: https://github.com/gogojjh/cobra

Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor Environments

TL;DR

The paper tackles real-time outdoor autonomous navigation by introducing an online metric-semantic mapping system that fuses LiDAR, vision, and IMU data into a GPU-accelerated TSDF framework with semantic labeling. It combines a LiDAR-Visual-Inertial state estimator, CNN-based pixel-wise segmentation, and a Bayesian fusion mechanism to produce a global 3D mesh annotated with semantic classes, from which traversable regions are extracted for map-based localization and planning. The approach is validated on 24 sequences across public and campus datasets, showing strong geometric and semantic accuracy with rapid per-frame updates (millisecond-scale), and demonstrates real-world point-to-point navigation using the generated maps. The work advances outdoor semantic mapping by enabling real-time, large-scale, semantically informed navigation and provides public code and datasets to foster reproducibility and further research.

Abstract

The creation of a metric-semantic map, which encodes human-prior knowledge, represents a high-level abstraction of environments. However, constructing such a map poses challenges related to the fusion of multi-modal sensor data, the attainment of real-time mapping performance, and the preservation of structural and semantic information consistency. In this paper, we introduce an online metric-semantic mapping system that utilizes LiDAR-Visual-Inertial sensing to generate a global metric-semantic mesh map of large-scale outdoor environments. Leveraging GPU acceleration, our mapping process achieves exceptional speed, with frame processing taking less than 7ms, regardless of scenario scale. Furthermore, we seamlessly integrate the resultant map into a real-world navigation system, enabling metric-semantic-based terrain assessment and autonomous point-to-point navigation within a campus environment. Through extensive experiments conducted on both publicly available and self-collected datasets comprising 24 sequences, we demonstrate the effectiveness of our mapping and navigation methodologies. Code has been publicly released: https://github.com/gogojjh/cobra

Paper Structure

This paper contains 37 sections, 6 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: To successfully navigate in the complicated environment or conduct high-level or interactive tasks for a robot (such as the vehicle shown in the figure), semantic information that categorizes surrounding objects at a human-readable format is required.
  • Figure 2: Block diagram illustrating the full pipeline of the proposed mapping system. The system starts with the state estimation (see Section \ref{['sec:mapping_state_estimator']}). The segmentation module (see Section \ref{['sec:semantic_segmentation']}) annotates each image pixel with a label. The measurement proecssing module converts point clouds into range and depth images. The mapping (see Section \ref{['sec:mapping_metric_semantic_mapping']}) constructs a global metric-semantic mesh map. The resulting map is extracted with traversable regions (see Section \ref{['sec:mapping_traversability']}), and then used for localization and generating a collision-free path by a motion planning algorithm (see Section \ref{['sec:mapping_navigation']}).
  • Figure 3: The non-projective distance uses the local planarity of surfaces to approximate the true distance $d_i$. $\psi_{i}$ is the projective distance of the voxel. The gradient vector is computed as the weighted average of normal vectors. $d_{i}$ is calculated according to equ.\ref{['equ:distance']}. The radius of the curved surface (approximate to a circle) in (b) is marked as $r$.
  • Figure 4: (a) The mapping device that consists of a high-resolution LiDAR and camera is used to collect data for the environmental mapping. (b) The real-world vehicle provides a platform for testing the navigation system.
  • Figure 5: We show a few samples from our dataset (top) and corresponding annotations (bottom). All images are collected at the campus.
  • ...and 4 more figures