Table of Contents
Fetching ...

ROMAN: Open-Set Object Map Alignment for Robust View-Invariant Global Localization

Mason B. Peterson, Yixuan Jia, Yulun Tian, Annika Thomas, Jonathan P. How

TL;DR

ROMAN tackles global localization under drastic viewpoint changes by building open-set object maps and aligning them with a gravity-aware, graph-based data association that fuses semantic (CLIP-based) and geometric (shape/volume) cues. The method introduces a unified submap alignment framework and enhances the affinity metrics with metric-semantic attributes and a gravity prior, enabling reliable associations even when maps are created from opposite-facing routes. The authors demonstrate substantial improvements over segment-based and image-based baselines across indoor, urban, and off-road scenarios, including up to 45% improvement in relative pose estimation and up to 35% reduction in trajectory error in challenging multi-robot SLAM sequences. ROMAN also achieves robust cross-view localization and scalable, communication-efficient object maps, highlighting its practical impact for drift-free navigation and collaborative SLAM in diverse environments.

Abstract

Global localization is a fundamental capability required for long-term and drift-free robot navigation. However, current methods fail to relocalize when faced with significantly different viewpoints. We present ROMAN (Robust Object Map Alignment Anywhere), a global localization method capable of localizing in challenging and diverse environments by creating and aligning maps of open-set and view-invariant objects. ROMAN formulates and solves a registration problem between object submaps using a unified graph-theoretic global data association approach with a novel incorporation of a gravity direction prior and object shape and semantic similarity. This work's open-set object mapping and information-rich object association algorithm enables global localization, even in instances when maps are created from robots traveling in opposite directions. Through a set of challenging global localization experiments in indoor, urban, and unstructured/forested environments, we demonstrate that ROMAN achieves higher relative pose estimation accuracy than other image-based pose estimation methods or segment-based registration methods. Additionally, we evaluate ROMAN as a loop closure module in large-scale multi-robot SLAM and show a 35% improvement in trajectory estimation error compared to standard SLAM systems using visual features for loop closures. Code and videos can be found at https://acl.mit.edu/roman.

ROMAN: Open-Set Object Map Alignment for Robust View-Invariant Global Localization

TL;DR

ROMAN tackles global localization under drastic viewpoint changes by building open-set object maps and aligning them with a gravity-aware, graph-based data association that fuses semantic (CLIP-based) and geometric (shape/volume) cues. The method introduces a unified submap alignment framework and enhances the affinity metrics with metric-semantic attributes and a gravity prior, enabling reliable associations even when maps are created from opposite-facing routes. The authors demonstrate substantial improvements over segment-based and image-based baselines across indoor, urban, and off-road scenarios, including up to 45% improvement in relative pose estimation and up to 35% reduction in trajectory error in challenging multi-robot SLAM sequences. ROMAN also achieves robust cross-view localization and scalable, communication-efficient object maps, highlighting its practical impact for drift-free navigation and collaborative SLAM in diverse environments.

Abstract

Global localization is a fundamental capability required for long-term and drift-free robot navigation. However, current methods fail to relocalize when faced with significantly different viewpoints. We present ROMAN (Robust Object Map Alignment Anywhere), a global localization method capable of localizing in challenging and diverse environments by creating and aligning maps of open-set and view-invariant objects. ROMAN formulates and solves a registration problem between object submaps using a unified graph-theoretic global data association approach with a novel incorporation of a gravity direction prior and object shape and semantic similarity. This work's open-set object mapping and information-rich object association algorithm enables global localization, even in instances when maps are created from robots traveling in opposite directions. Through a set of challenging global localization experiments in indoor, urban, and unstructured/forested environments, we demonstrate that ROMAN achieves higher relative pose estimation accuracy than other image-based pose estimation methods or segment-based registration methods. Additionally, we evaluate ROMAN as a loop closure module in large-scale multi-robot SLAM and show a 35% improvement in trajectory estimation error compared to standard SLAM systems using visual features for loop closures. Code and videos can be found at https://acl.mit.edu/roman.

Paper Structure

This paper contains 24 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Pair of segment submaps matched by two robots traveling in opposite directions in an off-road environment. Associated segments found by the proposed method are connected by lines and projected onto the image plane. (Top) Each pair of associated segments is drawn with the same color. The remaining, unmatched segments are shown in random colors and all other background points are shown in gray. (Bottom) The same associated segments and their convex hulls are visualized in the original image observations. Further visualization is shown in the supplementary video.
  • Figure 2: Visualization of improved affinity metrics. The gravity-based distance score, $s_\text{gravity}$ promotes pairs of associations that are consistent with the direction of gravity, while $s_\text{shape}$ and $s_\text{semantic}$ are used to encourage individual associations to be consistent in terms of geometric shape and semantics respectively.
  • Figure 3: ROMAN employs a front-end mapping module to create maps of open-set objects, representing each object with its centroid and feature descriptor. Local collections of objects are grouped into submaps and used for global localization by matching objects between two submaps. Accurate data association is achieved using a graph-theoretic formulation which leverages object shape and semantic similarity and a gravity prior.
  • Figure 4: Off-road qualitative pose graph trajectory estimate. Easy, medium, and hard cases comparing using ROMAN and KM for loop closures. Different combinations were paired together to make easy, medium, and hard cases. In the easy case, robots travel in the same direction; in the medium case, the two runs go in opposite directions except for the small connecting neck; and in the hard case, robots only cross paths going in opposite directions. Only ROMAN successfully finds loop closures between robots running in opposite directions.
  • Figure 5: Environmental setup used in the ground-aerial cross-view localization experiment as seen from both ground view (left) and aerial view (right).