Semantic Visual Simultaneous Localization and Mapping: A Survey
Kaiqi Chen, Junhao Xiao, Jialing Liu, Qiyi Tong, Heng Zhang, Ruyu Liu, Jianhua Zhang, Arash Ajoudani, Shengyong Chen
TL;DR
Semantic vSLAM extends traditional vSLAM by integrating semantic information to handle dynamic environments and produce meaningful maps. The survey reviews semantic information extraction methods (object detection, semantic segmentation, instance segmentation), semantic object association strategies, and semantic applications in localization and mapping, along with a comprehensive analysis of 31 datasets. It compares semantic vSLAM with traditional approaches, outlining development trends and performance trade-offs. The paper also outlines future directions including multimodal data fusion and multi-robot collaboration to enhance robustness and scalability in real-world robotics and AR/VR applications.
Abstract
Visual Simultaneous Localization and Mapping (vSLAM) has achieved great progress in the computer vision and robotics communities, and has been successfully used in many fields such as autonomous robot navigation and AR/VR. However, vSLAM cannot achieve good localization in dynamic and complex environments. Numerous publications have reported that, by combining with the semantic information with vSLAM, the semantic vSLAM systems have the capability of solving the above problems in recent years. Nevertheless, there is no comprehensive survey about semantic vSLAM. To fill the gap, this paper first reviews the development of semantic vSLAM, explicitly focusing on its strengths and differences. Secondly, we explore three main issues of semantic vSLAM: the extraction and association of semantic information, the application of semantic information, and the advantages of semantic vSLAM. Then, we collect and analyze the current state-of-the-art SLAM datasets which have been widely used in semantic vSLAM systems. Finally, we discuss future directions that will provide a blueprint for the future development of semantic vSLAM.
