Table of Contents
Fetching ...

Robot Localization Using a Learned Keypoint Detector and Descriptor with a Floor Camera and a Feature Rich Industrial Floor

Piet Brömmel, Dominik Brämer, Oliver Urbann, Diana Kleingarn

TL;DR

KOALA introduces a ground-floor localization approach that uses a custom Segmentation Detector (SEG) and Similarity Descriptor (SIM) to localize a robot from floor images without markers. The framework includes a map creation and position estimation pipeline and is evaluated on a large, motion-capture-ground-truth dataset, achieving up to 75.7% true success with mean position error near 2 cm and rotation error around 2.4°. Ablation studies reveal a trade-off between rotation invariance and true success rate, while the compact 30-dimensional descriptor enables faster querying and smaller maps. The work demonstrates practical, marker-less localization on industrial floors and suggests SLAM extensions, real-time optimization, and cross-floor generalization as promising directions.

Abstract

The localization of moving robots depends on the availability of good features from the environment. Sensor systems like Lidar are popular, but unique features can also be extracted from images of the ground. This work presents the Keypoint Localization Framework (KOALA), which utilizes deep neural networks that extract sufficient features from an industrial floor for accurate localization without having readable markers. For this purpose, we use a floor covering that can be produced as cheaply as common industrial floors. Although we do not use any filtering, prior, or temporal information, we can estimate our position in 75.7 % of all images with a mean position error of 2 cm and a rotation error of 2.4 %. Thus, the robot kidnapping problem can be solved with high precision in every frame, even while the robot is moving. Furthermore, we show that our framework with our detector and descriptor combination is able to outperform comparable approaches.

Robot Localization Using a Learned Keypoint Detector and Descriptor with a Floor Camera and a Feature Rich Industrial Floor

TL;DR

KOALA introduces a ground-floor localization approach that uses a custom Segmentation Detector (SEG) and Similarity Descriptor (SIM) to localize a robot from floor images without markers. The framework includes a map creation and position estimation pipeline and is evaluated on a large, motion-capture-ground-truth dataset, achieving up to 75.7% true success with mean position error near 2 cm and rotation error around 2.4°. Ablation studies reveal a trade-off between rotation invariance and true success rate, while the compact 30-dimensional descriptor enables faster querying and smaller maps. The work demonstrates practical, marker-less localization on industrial floors and suggests SLAM extensions, real-time optimization, and cross-floor generalization as promising directions.

Abstract

The localization of moving robots depends on the availability of good features from the environment. Sensor systems like Lidar are popular, but unique features can also be extracted from images of the ground. This work presents the Keypoint Localization Framework (KOALA), which utilizes deep neural networks that extract sufficient features from an industrial floor for accurate localization without having readable markers. For this purpose, we use a floor covering that can be produced as cheaply as common industrial floors. Although we do not use any filtering, prior, or temporal information, we can estimate our position in 75.7 % of all images with a mean position error of 2 cm and a rotation error of 2.4 %. Thus, the robot kidnapping problem can be solved with high precision in every frame, even while the robot is moving. Furthermore, we show that our framework with our detector and descriptor combination is able to outperform comparable approaches.

Paper Structure

This paper contains 13 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: A conceptual overview of our segmentation detector and similarity descriptor. Upper: Raw image segmented into an RGBW mask with detected keypoints. Lower: Keypoint patches are rotated uniformly and encoded into latent vectors by a pretrained encoder.
  • Figure 2: Overview of the experimentation hall showing the motion capture system and the industrial floor.
  • Figure 3: Modified DJI Robomaster S1 with a floor camera and markers for the motion capture system.
  • Figure 4: Left, a raw 4.95 x 2.80 floor image with RGBW pattern. The same image brightened and contrasted for improved clarity in the middle, and the segmentation mask of the image on the right.
  • Figure 5: Keypoints extracted from a run before clustering (left) and after (right).
  • ...and 4 more figures