Table of Contents
Fetching ...

Semantic Visual Simultaneous Localization and Mapping: A Survey

Kaiqi Chen, Junhao Xiao, Jialing Liu, Qiyi Tong, Heng Zhang, Ruyu Liu, Jianhua Zhang, Arash Ajoudani, Shengyong Chen

TL;DR

Semantic vSLAM extends traditional vSLAM by integrating semantic information to handle dynamic environments and produce meaningful maps. The survey reviews semantic information extraction methods (object detection, semantic segmentation, instance segmentation), semantic object association strategies, and semantic applications in localization and mapping, along with a comprehensive analysis of 31 datasets. It compares semantic vSLAM with traditional approaches, outlining development trends and performance trade-offs. The paper also outlines future directions including multimodal data fusion and multi-robot collaboration to enhance robustness and scalability in real-world robotics and AR/VR applications.

Abstract

Visual Simultaneous Localization and Mapping (vSLAM) has achieved great progress in the computer vision and robotics communities, and has been successfully used in many fields such as autonomous robot navigation and AR/VR. However, vSLAM cannot achieve good localization in dynamic and complex environments. Numerous publications have reported that, by combining with the semantic information with vSLAM, the semantic vSLAM systems have the capability of solving the above problems in recent years. Nevertheless, there is no comprehensive survey about semantic vSLAM. To fill the gap, this paper first reviews the development of semantic vSLAM, explicitly focusing on its strengths and differences. Secondly, we explore three main issues of semantic vSLAM: the extraction and association of semantic information, the application of semantic information, and the advantages of semantic vSLAM. Then, we collect and analyze the current state-of-the-art SLAM datasets which have been widely used in semantic vSLAM systems. Finally, we discuss future directions that will provide a blueprint for the future development of semantic vSLAM.

Semantic Visual Simultaneous Localization and Mapping: A Survey

TL;DR

Semantic vSLAM extends traditional vSLAM by integrating semantic information to handle dynamic environments and produce meaningful maps. The survey reviews semantic information extraction methods (object detection, semantic segmentation, instance segmentation), semantic object association strategies, and semantic applications in localization and mapping, along with a comprehensive analysis of 31 datasets. It compares semantic vSLAM with traditional approaches, outlining development trends and performance trade-offs. The paper also outlines future directions including multimodal data fusion and multi-robot collaboration to enhance robustness and scalability in real-world robotics and AR/VR applications.

Abstract

Visual Simultaneous Localization and Mapping (vSLAM) has achieved great progress in the computer vision and robotics communities, and has been successfully used in many fields such as autonomous robot navigation and AR/VR. However, vSLAM cannot achieve good localization in dynamic and complex environments. Numerous publications have reported that, by combining with the semantic information with vSLAM, the semantic vSLAM systems have the capability of solving the above problems in recent years. Nevertheless, there is no comprehensive survey about semantic vSLAM. To fill the gap, this paper first reviews the development of semantic vSLAM, explicitly focusing on its strengths and differences. Secondly, we explore three main issues of semantic vSLAM: the extraction and association of semantic information, the application of semantic information, and the advantages of semantic vSLAM. Then, we collect and analyze the current state-of-the-art SLAM datasets which have been widely used in semantic vSLAM systems. Finally, we discuss future directions that will provide a blueprint for the future development of semantic vSLAM.
Paper Structure (22 sections, 6 equations, 3 figures, 2 tables)

This paper contains 22 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The schematic diagram of the overall structure of the paper.
  • Figure 2: The overall framework of semantic robotics. Semantic visual SLAM consists of semantic information extraction and visual SLAM modules, which influence each other. Semantic visual SLAM is widely used in autonomous driving, path planning, and navigation.
  • Figure 3: (a)(b) Images of scenes from different perspectives. (c) 3D map based on point cloud representation of traditional vSLAM. (d)Environment reconstruction with semantic information.