Table of Contents
Fetching ...

S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Thanh Nguyen Canh, Van-Truong Nguyen, Xiem HoangVan, Armagan Elibol, Nak Young Chong

TL;DR

The paper addresses the challenge of indoor UAV perception by integrating semantic information into real-time mapping under limited compute. It introduces S3M, a pipeline that combines ORB-SLAM3 for accurate 6-DoF pose estimation, PSPNet-based semantic segmentation, semantic fusion, and OctoMap-based semantic-aware mapping to produce an object-level, memory-efficient representation. The approach is validated in Gazebo simulations and on Jetson Xavier AGX, showing real-time performance and improved scene understanding for navigation and task execution. This work advances autonomous UAV operations in cluttered indoor environments by providing a scalable, semantically enriched map that supports higher-level decision making and planning.

Abstract

Unmanned Aerial Vehicles (UAVs) hold immense potential for critical applications, such as search and rescue operations, where accurate perception of indoor environments is paramount. However, the concurrent amalgamation of localization, 3D reconstruction, and semantic segmentation presents a notable hurdle, especially in the context of UAVs equipped with constrained power and computational resources. This paper presents a novel approach to address challenges in semantic information extraction and utilization within UAV operations. Our system integrates state-of-the-art visual SLAM to estimate a comprehensive 6-DoF pose and advanced object segmentation methods at the back end. To improve the computational and storage efficiency of the framework, we adopt a streamlined voxel-based 3D map representation - OctoMap to build a working system. Furthermore, the fusion algorithm is incorporated to obtain the semantic information of each frame from the front-end SLAM task, and the corresponding point. By leveraging semantic information, our framework enhances the UAV's ability to perceive and navigate through indoor spaces, addressing challenges in pose estimation accuracy and uncertainty reduction. Through Gazebo simulations, we validate the efficacy of our proposed system and successfully embed our approach into a Jetson Xavier AGX unit for real-world applications.

S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

TL;DR

The paper addresses the challenge of indoor UAV perception by integrating semantic information into real-time mapping under limited compute. It introduces S3M, a pipeline that combines ORB-SLAM3 for accurate 6-DoF pose estimation, PSPNet-based semantic segmentation, semantic fusion, and OctoMap-based semantic-aware mapping to produce an object-level, memory-efficient representation. The approach is validated in Gazebo simulations and on Jetson Xavier AGX, showing real-time performance and improved scene understanding for navigation and task execution. This work advances autonomous UAV operations in cluttered indoor environments by providing a scalable, semantically enriched map that supports higher-level decision making and planning.

Abstract

Unmanned Aerial Vehicles (UAVs) hold immense potential for critical applications, such as search and rescue operations, where accurate perception of indoor environments is paramount. However, the concurrent amalgamation of localization, 3D reconstruction, and semantic segmentation presents a notable hurdle, especially in the context of UAVs equipped with constrained power and computational resources. This paper presents a novel approach to address challenges in semantic information extraction and utilization within UAV operations. Our system integrates state-of-the-art visual SLAM to estimate a comprehensive 6-DoF pose and advanced object segmentation methods at the back end. To improve the computational and storage efficiency of the framework, we adopt a streamlined voxel-based 3D map representation - OctoMap to build a working system. Furthermore, the fusion algorithm is incorporated to obtain the semantic information of each frame from the front-end SLAM task, and the corresponding point. By leveraging semantic information, our framework enhances the UAV's ability to perceive and navigate through indoor spaces, addressing challenges in pose estimation accuracy and uncertainty reduction. Through Gazebo simulations, we validate the efficacy of our proposed system and successfully embed our approach into a Jetson Xavier AGX unit for real-world applications.
Paper Structure (12 sections, 5 equations, 9 figures, 1 algorithm)

This paper contains 12 sections, 5 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: System Architecture: The system is composed of three units: a full 6 DoF pose estimation of the drone through ORB-SLAM3 (Tracking part - yellow, Local Mapping part - blue, Loop Closing part - green), a 3D semantic segmentation branch, and a semantic fusion scheme
  • Figure 2: Structure of semantic segmentation
  • Figure 3: UAV and gazebo environment simulation
  • Figure 4: The comparison of trajectory for ORB-SLAM2, our method and ground truth in X-Y axis
  • Figure 5: The comparison of trajectory for ORB-SLAM2, our method and ground truth in X-Z axis
  • ...and 4 more figures