Table of Contents
Fetching ...

Robust Visual Localization in Compute-Constrained Environments by Salient Edge Rendering and Weighted Hamming Similarity

Tu-Hoa Pham, Philip Bailey, Daniel Posada, Georgios Georgakis, Jorge Enriquez, Surya Suresh, Marco Dolci, Philip Twu

TL;DR

This work addresses robust monocular 6-DoF object localization under severe compute and memory constraints by introducing a render-and-compare pipeline that relies on salient edge rendering and a novel edge-domain template matching metric. The core contributions are a geometry-first edge renderer, the Weighted Hamming Similarity (WHS) for robust template matching, and a comprehensive synthetic-plus-real dataset to validate performance under realistic Mars-like conditions. Empirical results show 100% localization success in synthetic and near-term real-world scenarios, with WHS displaying strong robustness to domain shifts and low-fidelity rendering, while maintaining feasibility on flight-grade hardware. The approach offers a practical, verifiable solution for autonomous on-board localization in resource-constrained space robotics and other compute-limited robotic systems.

Abstract

We consider the problem of vision-based 6-DoF object pose estimation in the context of the notional Mars Sample Return campaign, in which a robotic arm would need to localize multiple objects of interest for low-clearance pickup and insertion, under severely constrained hardware. We propose a novel localization algorithm leveraging a custom renderer together with a new template matching metric tailored to the edge domain to achieve robust pose estimation using only low-fidelity, textureless 3D models as inputs. Extensive evaluations on synthetic datasets as well as from physical testbeds on Earth and in situ Mars imagery shows that our method consistently beats the state of the art in compute and memory-constrained localization, both in terms of robustness and accuracy, in turn enabling new possibilities for cheap and reliable localization on general-purpose hardware.

Robust Visual Localization in Compute-Constrained Environments by Salient Edge Rendering and Weighted Hamming Similarity

TL;DR

This work addresses robust monocular 6-DoF object localization under severe compute and memory constraints by introducing a render-and-compare pipeline that relies on salient edge rendering and a novel edge-domain template matching metric. The core contributions are a geometry-first edge renderer, the Weighted Hamming Similarity (WHS) for robust template matching, and a comprehensive synthetic-plus-real dataset to validate performance under realistic Mars-like conditions. Empirical results show 100% localization success in synthetic and near-term real-world scenarios, with WHS displaying strong robustness to domain shifts and low-fidelity rendering, while maintaining feasibility on flight-grade hardware. The approach offers a practical, verifiable solution for autonomous on-board localization in resource-constrained space robotics and other compute-limited robotic systems.

Abstract

We consider the problem of vision-based 6-DoF object pose estimation in the context of the notional Mars Sample Return campaign, in which a robotic arm would need to localize multiple objects of interest for low-clearance pickup and insertion, under severely constrained hardware. We propose a novel localization algorithm leveraging a custom renderer together with a new template matching metric tailored to the edge domain to achieve robust pose estimation using only low-fidelity, textureless 3D models as inputs. Extensive evaluations on synthetic datasets as well as from physical testbeds on Earth and in situ Mars imagery shows that our method consistently beats the state of the art in compute and memory-constrained localization, both in terms of robustness and accuracy, in turn enabling new possibilities for cheap and reliable localization on general-purpose hardware.

Paper Structure

This paper contains 16 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Top: concept view of the Perseverance rover meeting the Sample Return Lander (courtesy of NASA) to transfer samples from bit carousel (BC) to orbiting sample (OS) canister. Middle and bottom: seed (red) and pose from visual localization (green) on synthetic, testbed and Mars data.
  • Figure 2: Pose estimation pipeline. Top: input image with seed pose overlaid in red, after histogram equalization, and after Canny edge detection (notice the background noise). Bottom: template matching using Weighted Hamming Similarity against rendered salient edges, and estimated pose in green.
  • Figure 3: Three datasets for evaluation: Mars 2020 rover pointing its arm camera at the BC, far and near shot (top), lander OS testbed, left and right images (middle), virtual scene with lander cameras on rover BC, left and right renders (bottom).
  • Figure 4: Error distribution per method and dimension on synthetic data for rover BC (top) and lander OS (bottom).
  • Figure 5: WHS on ground sample (top), daily icar:calli:2015 (middle) and transparent objects ral:yu:2023 (bottom) with $3cm$, $10°$ seed error.