GroundSLAM: A Robust Visual SLAM System for Warehouse Robots Using Ground Textures
Kuan Xu, Zheng Yang, Lihua Xie, Chen Wang
TL;DR
GroundSLAM introduces a robust 3-DOF visual SLAM system for warehouse robots that uses a downward-facing camera to exploit ground textures, addressing failures of forward-looking cameras in dynamic and textureless environments. It centers on a feature-free, image-level matching framework via a kernel cross-correlator (KCC) with a closed-form Fourier-domain solution, enabling reliable visual odometry, loop closure, and map reuse. The authors release PathTex, a 131k-image ground texture dataset with precise ground truth, and demonstrate through extensive experiments that GroundSLAM outperforms state-of-the-art ground-texture and monocular SLAM baselines across indoor and outdoor textures while maintaining real-time performance. Overall, GroundSLAM provides a low-cost, robust solution for drift-free, multi-robot localization in warehouses, with direct applicability to fleet-wide mapping and navigation.
Abstract
A robust visual localization and mapping system is essential for warehouse robot navigation, as cameras offer a more cost-effective alternative to LiDAR sensors. However, existing forward-facing camera systems often encounter challenges in dynamic environments and open spaces, leading to significant performance degradation during deployment. To address these limitations, a localization system utilizing a single downward-facing camera to capture ground textures presents a promising solution. Nevertheless, existing feature-based ground-texture localization methods face difficulties when operating on surfaces with sparse features or repetitive patterns. To address this limitation, we propose GroundSLAM, a novel feature-free and ground-texture-based simultaneous localization and mapping (SLAM) system. GroundSLAM consists of three components: feature-free visual odometry, ground-texture-based loop detection and map optimization, and map reuse. Specifically, we introduce a kernel cross-correlator (KCC) for image-level pose tracking, loop detection, and map reuse to improve localization accuracy and robustness, and incorporate adaptive pruning strategies to enhance efficiency. Due to these specific designs, GroundSLAM is able to deliver efficient and stable localization across various ground surfaces such as those with sparse features and repetitive patterns. To advance research in this area, we introduce the first ground-texture dataset with precise ground-truth poses, consisting of 131k images collected from 10 kinds of indoor and outdoor ground surfaces. Extensive experimental results show that GroundSLAM outperforms state-of-the-art methods for both indoor and outdoor localization. We release our code and dataset at https://github.com/sair-lab/GroundSLAM.
