SIREN: Semantic, Initialization-Free Registration of Multi-Robot Gaussian Splatting Maps
Ola Shorinwa, Jiankai Sun, Mac Schwager, Anirudha Majumdar
TL;DR
SIREN addresses the challenge of registering multi-robot GSplat maps without camera poses or inter-map initializations by exploiting open-vocabulary semantics to locate informative regions and establish reliable Gaussian correspondences. It fuses a three-stage SCF pipeline—semantic feature extraction and matching, coarse Gaussian-to-Gaussian registration, and fine photometric registration with novel-view synthesis and semantics-based filtering—to produce high-fidelity fused maps. The method demonstrates dramatic improvements in rotation, translation, and scale accuracy across real-world datasets and heterogeneous hardware, while maintaining photometric realism; ablations and finetuning further enhance visual fidelity. This approach significantly broadens multi-robot mapping capabilities for large-scale and open-world robotics tasks by removing reliance on pose and image data during fusion, enabling robust, high-quality environment models from diverse platforms.
Abstract
We present SIREN for registration of multi-robot Gaussian Splatting (GSplat) maps, with zero access to camera poses, images, and inter-map transforms for initialization or fusion of local submaps. To realize these capabilities, SIREN harnesses the versatility and robustness of semantics in three critical ways to derive a rigorous registration pipeline for multi-robot GSplat maps. First, SIREN utilizes semantics to identify feature-rich regions of the local maps where the registration problem is better posed, eliminating the need for any initialization which is generally required in prior work. Second, SIREN identifies candidate correspondences between Gaussians in the local maps using robust semantic features, constituting the foundation for robust geometric optimization, coarsely aligning 3D Gaussian primitives extracted from the local maps. Third, this key step enables subsequent photometric refinement of the transformation between the submaps, where SIREN leverages novel-view synthesis in GSplat maps along with a semantics-based image filter to compute a high-accuracy non-rigid transformation for the generation of a high-fidelity fused map. We demonstrate the superior performance of SIREN compared to competing baselines across a range of real-world datasets, and in particular, across the most widely-used robot hardware platforms, including a manipulator, drone, and quadruped. In our experiments, SIREN achieves about 90x smaller rotation errors, 300x smaller translation errors, and 44x smaller scale errors in the most challenging scenes, where competing methods struggle. We will release the code and provide a link to the project page after the review process.
