S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Thanh Nguyen Canh; Van-Truong Nguyen; Xiem HoangVan; Armagan Elibol; Nak Young Chong

S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Thanh Nguyen Canh, Van-Truong Nguyen, Xiem HoangVan, Armagan Elibol, Nak Young Chong

TL;DR

The paper addresses the challenge of indoor UAV perception by integrating semantic information into real-time mapping under limited compute. It introduces S3M, a pipeline that combines ORB-SLAM3 for accurate 6-DoF pose estimation, PSPNet-based semantic segmentation, semantic fusion, and OctoMap-based semantic-aware mapping to produce an object-level, memory-efficient representation. The approach is validated in Gazebo simulations and on Jetson Xavier AGX, showing real-time performance and improved scene understanding for navigation and task execution. This work advances autonomous UAV operations in cluttered indoor environments by providing a scalable, semantically enriched map that supports higher-level decision making and planning.

Abstract

Unmanned Aerial Vehicles (UAVs) hold immense potential for critical applications, such as search and rescue operations, where accurate perception of indoor environments is paramount. However, the concurrent amalgamation of localization, 3D reconstruction, and semantic segmentation presents a notable hurdle, especially in the context of UAVs equipped with constrained power and computational resources. This paper presents a novel approach to address challenges in semantic information extraction and utilization within UAV operations. Our system integrates state-of-the-art visual SLAM to estimate a comprehensive 6-DoF pose and advanced object segmentation methods at the back end. To improve the computational and storage efficiency of the framework, we adopt a streamlined voxel-based 3D map representation - OctoMap to build a working system. Furthermore, the fusion algorithm is incorporated to obtain the semantic information of each frame from the front-end SLAM task, and the corresponding point. By leveraging semantic information, our framework enhances the UAV's ability to perceive and navigate through indoor spaces, addressing challenges in pose estimation accuracy and uncertainty reduction. Through Gazebo simulations, we validate the efficacy of our proposed system and successfully embed our approach into a Jetson Xavier AGX unit for real-world applications.

S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

TL;DR

Abstract

Paper Structure (12 sections, 5 equations, 9 figures, 1 algorithm)

This paper contains 12 sections, 5 equations, 9 figures, 1 algorithm.

Introduction
Methodology
Pose estimation
Semantic segmentation
Semanic fusion
Semantic map creation
Experimental Results
UAVs Simulation
Pose estimation evaluation
Training and evaluation on SUNRGBD dataset
Semantic Map
Conclusion

Figures (9)

Figure 1: System Architecture: The system is composed of three units: a full 6 DoF pose estimation of the drone through ORB-SLAM3 (Tracking part - yellow, Local Mapping part - blue, Loop Closing part - green), a 3D semantic segmentation branch, and a semantic fusion scheme
Figure 2: Structure of semantic segmentation
Figure 3: UAV and gazebo environment simulation
Figure 4: The comparison of trajectory for ORB-SLAM2, our method and ground truth in X-Y axis
Figure 5: The comparison of trajectory for ORB-SLAM2, our method and ground truth in X-Z axis
...and 4 more figures

S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

TL;DR

Abstract

S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Authors

TL;DR

Abstract

Table of Contents

Figures (9)