Table of Contents
Fetching ...

GroundSLAM: A Robust Visual SLAM System for Warehouse Robots Using Ground Textures

Kuan Xu, Zheng Yang, Lihua Xie, Chen Wang

TL;DR

GroundSLAM introduces a robust 3-DOF visual SLAM system for warehouse robots that uses a downward-facing camera to exploit ground textures, addressing failures of forward-looking cameras in dynamic and textureless environments. It centers on a feature-free, image-level matching framework via a kernel cross-correlator (KCC) with a closed-form Fourier-domain solution, enabling reliable visual odometry, loop closure, and map reuse. The authors release PathTex, a 131k-image ground texture dataset with precise ground truth, and demonstrate through extensive experiments that GroundSLAM outperforms state-of-the-art ground-texture and monocular SLAM baselines across indoor and outdoor textures while maintaining real-time performance. Overall, GroundSLAM provides a low-cost, robust solution for drift-free, multi-robot localization in warehouses, with direct applicability to fleet-wide mapping and navigation.

Abstract

A robust visual localization and mapping system is essential for warehouse robot navigation, as cameras offer a more cost-effective alternative to LiDAR sensors. However, existing forward-facing camera systems often encounter challenges in dynamic environments and open spaces, leading to significant performance degradation during deployment. To address these limitations, a localization system utilizing a single downward-facing camera to capture ground textures presents a promising solution. Nevertheless, existing feature-based ground-texture localization methods face difficulties when operating on surfaces with sparse features or repetitive patterns. To address this limitation, we propose GroundSLAM, a novel feature-free and ground-texture-based simultaneous localization and mapping (SLAM) system. GroundSLAM consists of three components: feature-free visual odometry, ground-texture-based loop detection and map optimization, and map reuse. Specifically, we introduce a kernel cross-correlator (KCC) for image-level pose tracking, loop detection, and map reuse to improve localization accuracy and robustness, and incorporate adaptive pruning strategies to enhance efficiency. Due to these specific designs, GroundSLAM is able to deliver efficient and stable localization across various ground surfaces such as those with sparse features and repetitive patterns. To advance research in this area, we introduce the first ground-texture dataset with precise ground-truth poses, consisting of 131k images collected from 10 kinds of indoor and outdoor ground surfaces. Extensive experimental results show that GroundSLAM outperforms state-of-the-art methods for both indoor and outdoor localization. We release our code and dataset at https://github.com/sair-lab/GroundSLAM.

GroundSLAM: A Robust Visual SLAM System for Warehouse Robots Using Ground Textures

TL;DR

GroundSLAM introduces a robust 3-DOF visual SLAM system for warehouse robots that uses a downward-facing camera to exploit ground textures, addressing failures of forward-looking cameras in dynamic and textureless environments. It centers on a feature-free, image-level matching framework via a kernel cross-correlator (KCC) with a closed-form Fourier-domain solution, enabling reliable visual odometry, loop closure, and map reuse. The authors release PathTex, a 131k-image ground texture dataset with precise ground truth, and demonstrate through extensive experiments that GroundSLAM outperforms state-of-the-art ground-texture and monocular SLAM baselines across indoor and outdoor textures while maintaining real-time performance. Overall, GroundSLAM provides a low-cost, robust solution for drift-free, multi-robot localization in warehouses, with direct applicability to fleet-wide mapping and navigation.

Abstract

A robust visual localization and mapping system is essential for warehouse robot navigation, as cameras offer a more cost-effective alternative to LiDAR sensors. However, existing forward-facing camera systems often encounter challenges in dynamic environments and open spaces, leading to significant performance degradation during deployment. To address these limitations, a localization system utilizing a single downward-facing camera to capture ground textures presents a promising solution. Nevertheless, existing feature-based ground-texture localization methods face difficulties when operating on surfaces with sparse features or repetitive patterns. To address this limitation, we propose GroundSLAM, a novel feature-free and ground-texture-based simultaneous localization and mapping (SLAM) system. GroundSLAM consists of three components: feature-free visual odometry, ground-texture-based loop detection and map optimization, and map reuse. Specifically, we introduce a kernel cross-correlator (KCC) for image-level pose tracking, loop detection, and map reuse to improve localization accuracy and robustness, and incorporate adaptive pruning strategies to enhance efficiency. Due to these specific designs, GroundSLAM is able to deliver efficient and stable localization across various ground surfaces such as those with sparse features and repetitive patterns. To advance research in this area, we introduce the first ground-texture dataset with precise ground-truth poses, consisting of 131k images collected from 10 kinds of indoor and outdoor ground surfaces. Extensive experimental results show that GroundSLAM outperforms state-of-the-art methods for both indoor and outdoor localization. We release our code and dataset at https://github.com/sair-lab/GroundSLAM.

Paper Structure

This paper contains 45 sections, 6 theorems, 35 equations, 14 figures, 7 tables, 1 algorithm.

Key Result

Lemma 1

Consider a function $\mathbf{f}$ and a transformation $\mathcal{T}$, the function $\mathbf{f}$ is equivariant to $\mathcal{T}$ if for all input $\mathbf{x}$, where $\mathcal{T}'$ might be the same as $\mathcal{T}$ or another related transformation. In simple terms, a function is said to be equivariant if the input changes in a certain way, the output changes in a predictable and corresponding man

Figures (14)

  • Figure 1: In the warehouse, dynamic objects (robots and storage racks) and distant features make the localization with a forward-facing camera or LiDAR very challenging.
  • Figure 2: The pipeline of our GroundSLAM system. In the front-end, rotation and translation are decoupled and estimated using kernel cross-correlators. In the back-end, a keyframe-based map is maintained to facilitate loop closure detection and correction.
  • Figure 3: An illustration of relative transformation estimation using the proposed KCC. The keyframe and the current image are represented as $\mathbf{z}$ and $\mathbf{x}$, respectively. Their relative rotation $\mathbf{\theta}$ and translation $\mathbf{t}$ are estimated in a decoupled way.
  • Figure 4: Example ground texture images in our dataset. We collect 10 kinds of ground textures, including 6 kinds of outdoor textures and 4 kinds of indoor textures.
  • Figure 5: The comparison of data association of ORB, SIFT, and KCC on the HD Ground dataset. The numbers of features and matching inliers are given. For the KCC, the correction results are projected to three coordinate axes and represent the estimation of the 3-DOF movement. The vertical axis is the confidence of estimated movement on the horizontal axis. The higher the value of the peak relative to other positions, the greater the confidence of motion estimation.
  • ...and 9 more figures

Theorems & Definitions (6)

  • Lemma 1: Equivariance gaunce2006equivariant
  • Theorem 2: KCC Equivariance
  • Theorem 3: KCC
  • Theorem 4: Weighted Regularized KCC
  • Theorem 5
  • Lemma 6: Cross-Correlation Theorem bracewell1986fourier