Table of Contents
Fetching ...

SIREN: Semantic, Initialization-Free Registration of Multi-Robot Gaussian Splatting Maps

Ola Shorinwa, Jiankai Sun, Mac Schwager, Anirudha Majumdar

TL;DR

SIREN addresses the challenge of registering multi-robot GSplat maps without camera poses or inter-map initializations by exploiting open-vocabulary semantics to locate informative regions and establish reliable Gaussian correspondences. It fuses a three-stage SCF pipeline—semantic feature extraction and matching, coarse Gaussian-to-Gaussian registration, and fine photometric registration with novel-view synthesis and semantics-based filtering—to produce high-fidelity fused maps. The method demonstrates dramatic improvements in rotation, translation, and scale accuracy across real-world datasets and heterogeneous hardware, while maintaining photometric realism; ablations and finetuning further enhance visual fidelity. This approach significantly broadens multi-robot mapping capabilities for large-scale and open-world robotics tasks by removing reliance on pose and image data during fusion, enabling robust, high-quality environment models from diverse platforms.

Abstract

We present SIREN for registration of multi-robot Gaussian Splatting (GSplat) maps, with zero access to camera poses, images, and inter-map transforms for initialization or fusion of local submaps. To realize these capabilities, SIREN harnesses the versatility and robustness of semantics in three critical ways to derive a rigorous registration pipeline for multi-robot GSplat maps. First, SIREN utilizes semantics to identify feature-rich regions of the local maps where the registration problem is better posed, eliminating the need for any initialization which is generally required in prior work. Second, SIREN identifies candidate correspondences between Gaussians in the local maps using robust semantic features, constituting the foundation for robust geometric optimization, coarsely aligning 3D Gaussian primitives extracted from the local maps. Third, this key step enables subsequent photometric refinement of the transformation between the submaps, where SIREN leverages novel-view synthesis in GSplat maps along with a semantics-based image filter to compute a high-accuracy non-rigid transformation for the generation of a high-fidelity fused map. We demonstrate the superior performance of SIREN compared to competing baselines across a range of real-world datasets, and in particular, across the most widely-used robot hardware platforms, including a manipulator, drone, and quadruped. In our experiments, SIREN achieves about 90x smaller rotation errors, 300x smaller translation errors, and 44x smaller scale errors in the most challenging scenes, where competing methods struggle. We will release the code and provide a link to the project page after the review process.

SIREN: Semantic, Initialization-Free Registration of Multi-Robot Gaussian Splatting Maps

TL;DR

SIREN addresses the challenge of registering multi-robot GSplat maps without camera poses or inter-map initializations by exploiting open-vocabulary semantics to locate informative regions and establish reliable Gaussian correspondences. It fuses a three-stage SCF pipeline—semantic feature extraction and matching, coarse Gaussian-to-Gaussian registration, and fine photometric registration with novel-view synthesis and semantics-based filtering—to produce high-fidelity fused maps. The method demonstrates dramatic improvements in rotation, translation, and scale accuracy across real-world datasets and heterogeneous hardware, while maintaining photometric realism; ablations and finetuning further enhance visual fidelity. This approach significantly broadens multi-robot mapping capabilities for large-scale and open-world robotics tasks by removing reliance on pose and image data during fusion, enabling robust, high-quality environment models from diverse platforms.

Abstract

We present SIREN for registration of multi-robot Gaussian Splatting (GSplat) maps, with zero access to camera poses, images, and inter-map transforms for initialization or fusion of local submaps. To realize these capabilities, SIREN harnesses the versatility and robustness of semantics in three critical ways to derive a rigorous registration pipeline for multi-robot GSplat maps. First, SIREN utilizes semantics to identify feature-rich regions of the local maps where the registration problem is better posed, eliminating the need for any initialization which is generally required in prior work. Second, SIREN identifies candidate correspondences between Gaussians in the local maps using robust semantic features, constituting the foundation for robust geometric optimization, coarsely aligning 3D Gaussian primitives extracted from the local maps. Third, this key step enables subsequent photometric refinement of the transformation between the submaps, where SIREN leverages novel-view synthesis in GSplat maps along with a semantics-based image filter to compute a high-accuracy non-rigid transformation for the generation of a high-fidelity fused map. We demonstrate the superior performance of SIREN compared to competing baselines across a range of real-world datasets, and in particular, across the most widely-used robot hardware platforms, including a manipulator, drone, and quadruped. In our experiments, SIREN achieves about 90x smaller rotation errors, 300x smaller translation errors, and 44x smaller scale errors in the most challenging scenes, where competing methods struggle. We will release the code and provide a link to the project page after the review process.

Paper Structure

This paper contains 19 sections, 13 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: SIREN consists of three steps: (a) semantic feature extraction and matching of Gaussians across the local maps, (b) coarse Gaussian-to-Gaussian registration for coarsely aligning the local maps, (c) fine photometric registration for high-accuracy fusion of the local maps, through image-to-image registration and bundle adjustment.
  • Figure 2: SIREN: Multi-Robot Map Registration
  • Figure 3: Although RANSAC-GR achieves the highest mean PSNR and SSIM scores and the lowest LPIPS score in the Truck scene, RANSAC-GR does not accurately register the individual GSplat maps. While the right side of the truck in the RANSAC-GR fused map looks similar to the ground-truth image (shown in the top panel), the left side of the truck is missing (shown in the bottom panel). The standard deviation of the PSNR, SSIM, and LPIPS scores achieved by RANSAC-GR reflects the actual registration performance of the method.
  • Figure 4: Rendered images from the fused GSplat maps of the Playroom, Truck, and Room scenes. SIREN generates high-fidelity fused GSplat maps, evidenced by the precise geometric detail in the images, visible in the regions indicated by the green squares. Inaccurate registration of GSplat maps generally result in artifacts in the rendered images.
  • Figure 5: Stillshots of a quadruped mapping different areas of a kitchen and workshop and a drone mapping an apartment-like scene. Each robot trains independent GSplat submaps of the areas it mapped. The submaps of each scene are registered to obtain a composite map covering the entirety of the scene.
  • ...and 4 more figures