SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

Xu Liu; Jiuzhou Lei; Ankit Prabhu; Yuezhan Tao; Igor Spasojevic; Pratik Chaudhari; Nikolay Atanasov; Vijay Kumar

SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

Xu Liu, Jiuzhou Lei, Ankit Prabhu, Yuezhan Tao, Igor Spasojevic, Pratik Chaudhari, Nikolay Atanasov, Vijay Kumar

TL;DR

SlideSLAM introduces a real-time decentralized metric-semantic SLAM framework that enables heterogeneous robot teams to collaboratively construct sparse, object-level maps from RGBD and LiDAR sensors. The system combines a fast front-end for semantic object detection with a back-end factor-graph optimization that jointly estimates robot poses and object landmarks, supported by two place-recognition approaches (SlideMatch and SlideGraph) for inter-robot loop closures. By sharing lightweight semantic observations and performing decentralized optimization, the approach achieves real-time operation on SWaP-constrained platforms and scales to multi-robot exploration, demonstrated across indoor, outdoor, forest, and public-dataset benchmarks. The work includes extensive experiments, quantitative analyses, and an open-source release, highlighting low communication bandwidth, robust inter-robot localization, and large-scale semantic mapping capabilities.

Abstract

This paper develops a real-time decentralized metric-semantic SLAM algorithm that enables a heterogeneous robot team to collaboratively construct object-based metric-semantic maps. The proposed framework integrates a data-driven front-end for instance segmentation from either RGBD cameras or LiDARs and a custom back-end for optimizing robot trajectories and object landmarks in the map. To allow multiple robots to merge their information, we design semantics-driven place recognition algorithms that leverage the informativeness and viewpoint invariance of the object-level metric-semantic map for inter-robot loop closure detection. A communication module is designed to track each robot's observations and those of other robots whenever communication links are available. The framework supports real-time, decentralized operation onboard the robots and has been integrated with three types of aerial and ground platforms. We validate its effectiveness through experiments in both indoor and outdoor environments, as well as benchmarks on public datasets and comparisons with existing methods. The framework is open-sourced and suitable for both single-agent and multi-robot real-time metric-semantic SLAM applications. The code is available at: https://github.com/KumarRobotics/SLIDE_SLAM.

SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

TL;DR

Abstract

Paper Structure (47 sections, 15 equations, 10 figures, 7 tables)

This paper contains 47 sections, 15 equations, 10 figures, 7 tables.

Introduction
Related Work
Metric-semantic SLAM
Place recognition
Multi-robot SLAM
Semantics-in-the-loop navigation and exploration
Problem Formulation
Preliminaries
Dec-Metric-Semantic SLAM for multi-robot exploration
Metric-Semantic SLAM
Approach overview
Map representation
Object detection and modeling
Factor graph optimization with object models
Cuboid factors
...and 32 more sections

Figures (10)

Figure 1: Robot platforms used in our experiments. We utilize three types of robots for our experiments: two aerial platforms, the Falcon 250 UAV (left) and the Falcon 4 UAV (middle), and one ground platform, the Scarab UGV (right). The Light Detection and Ranging (LiDAR)-equipped robot (Falcon 4) is primarily used for outdoor operations due to its size and superior sensing capabilities. The RGB and Depth (RGBD) camera-based robots (Falcon 250 and Scarab) are more suitable for cluttered indoor environments due to their smaller footprints. All three platforms have GPS-denied autonomous navigation capabilities, enabling them to safely explore cluttered environments using only onboard computation and sensing.
Figure 2: Metric-semantic SLAM results from seven data sequences collected by heterogeneous robots. Trajectories in different colors correspond to different data sequences. Fig. \ref{['fig:subfigure a']} shows a 3D reconstruction of the Pennovation campus at the University of Pennsylvania. Outdoor objects, such as vehicles, tree trunks, and light poles are mapped as shown in \ref{['fig:subfigure e']}. Indoor objects, such as chairs, tables, and monitors are mapped as shown in Fig. \ref{['fig:subfigure h']}. Fig. \ref{['fig:subfigure b']} shows the same metric-semantic map overlayed on top of an accumulated point cloud constructed by our Falcon 4 UAV. Fig. \ref{['fig:subfigure c']} shows an orthophoto depicting the merged metric-semantic map of three parking lots and two buildings constructed by seven robots. Fig. \ref{['fig:subfigure g']} and Fig. \ref{['fig:subfigure h']} show a zoomed-in view of one of the lab buildings
Figure 3: System Diagram. Our system takes in data streams from each robot's onboard sensors, which can be either an RGBD camera or a LiDAR, and performs instance segmentation to extract semantic object features. Meanwhile, low-level odometry, either Visual-Inertial Odometry (VIO) or LiDAR-Inertial Odometry (LIO), provides relative-motion estimates between consecutive key poses. Next, the metric-semantic SLAM algorithm takes in such semantic observations and relative motion estimates, and constructs a factor graph consisting of both robot pose nodes and object landmark nodes. Meanwhile, our multi-robot communication module (see \ref{['fig:decentralized-slam-collaboration-module']}) opportunistically leverages connectivity to share lightweight semantic observations among robots in a decentralized way. Based on this shared information, our metric-semantic place recognition algorithm constantly checks for possible inter-robot loop closures at a fixed rate. Once a loop closure is detected, the resulting transformation between each pair of robots is used to transform all observations into each robot's reference frame. These observations are then added to their own factor graphs, forming a merged metric-semantic map. Note that the entire perception-action loop runs in a decentralized manner onboard each robot. Besides the obvious differences in control algorithms, the planning modules and the front-end processing algorithms are also different across each robot platform. This is due to the need to accommodate the differences in sensing modalities (RGBD and LiDAR), operating environments (indoor, urban, and forest), and traversal modes (ground and aerial). However, the core metric-semantic SLAM framework remains the same.
Figure 4: Multi-robot collaboration module for decentralized metric-semantic SLAM. The robots share lightweight metric-semantic observations necessary for constructing the factors between object landmarks and robot poses in the factor graph, which include the detected objects and the odometry relative motion estimate (w.r.t. the previous pose) associated with each key pose in the factor graph. Once the metric-semantic place recognition module successfully finds a loop closure with another robot, the shared observations from that robot will be transformed into the current robot's reference frame and added to the factor graph of the current robot.
Figure 5: Metric-semantic SLAM results on the KITTI dataset. The top, middle, and bottom panels show the results of experiments involving one, two, and three robots, respectively. Each cuboid represents a vehicle while each cylinder represents either a tree trunk or a light pole. The estimated robot trajectories are shown in orange, red, and blue for the first, second, and third robots, respectively.
...and 5 more figures

SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

TL;DR

Abstract

SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)