Table of Contents
Fetching ...

GelSLAM: A Real-time, High-Fidelity, and Robust 3D Tactile SLAM System

Hung-Jui Huang, Mohammad Amin Mirzaee, Michael Kaess, Wenzhen Yuan

TL;DR

Gelslam is a real-time 3D SLAM system that relies solely on tactile sensing to estimate object pose over long periods and reconstruct object shapes with high fidelity, and it can track object motion in real time with low error and minimal drift.

Abstract

Accurately perceiving an object's pose and shape is essential for precise grasping and manipulation. Compared to common vision-based methods, tactile sensing offers advantages in precision and immunity to occlusion when tracking and reconstructing objects in contact. This makes it particularly valuable for in-hand and other high-precision manipulation tasks. In this work, we present GelSLAM, a real-time 3D SLAM system that relies solely on tactile sensing to estimate object pose over long periods and reconstruct object shapes with high fidelity. Unlike traditional point cloud-based approaches, GelSLAM uses tactile-derived surface normals and curvatures for robust tracking and loop closure. It can track object motion in real time with low error and minimal drift, and reconstruct shapes with submillimeter accuracy, even for low-texture objects such as wooden tools. GelSLAM extends tactile sensing beyond local contact to enable global, long-horizon spatial perception, and we believe it will serve as a foundation for many precise manipulation tasks involving interaction with objects in hand. The video demo, code, and dataset are available at https://joehjhuang.github.io/gelslam.

GelSLAM: A Real-time, High-Fidelity, and Robust 3D Tactile SLAM System

TL;DR

Gelslam is a real-time 3D SLAM system that relies solely on tactile sensing to estimate object pose over long periods and reconstruct object shapes with high fidelity, and it can track object motion in real time with low error and minimal drift.

Abstract

Accurately perceiving an object's pose and shape is essential for precise grasping and manipulation. Compared to common vision-based methods, tactile sensing offers advantages in precision and immunity to occlusion when tracking and reconstructing objects in contact. This makes it particularly valuable for in-hand and other high-precision manipulation tasks. In this work, we present GelSLAM, a real-time 3D SLAM system that relies solely on tactile sensing to estimate object pose over long periods and reconstruct object shapes with high fidelity. Unlike traditional point cloud-based approaches, GelSLAM uses tactile-derived surface normals and curvatures for robust tracking and loop closure. It can track object motion in real time with low error and minimal drift, and reconstruct shapes with submillimeter accuracy, even for low-texture objects such as wooden tools. GelSLAM extends tactile sensing beyond local contact to enable global, long-horizon spatial perception, and we believe it will serve as a foundation for many precise manipulation tasks involving interaction with objects in hand. The video demo, code, and dataset are available at https://joehjhuang.github.io/gelslam.

Paper Structure

This paper contains 38 sections, 6 equations, 21 figures, 6 tables.

Figures (21)

  • Figure 1: GelSLAM enables robust, high-fidelity object-level 3D reconstruction and real-time, accurate long-horizon object tracking using only tactile sensing, as shown in (a). (b) Reconstruction results on a wide variety of objects, including small items like almonds and peanuts, low-texture objects such as the handle of pliers, and large objects like a tree trunk.
  • Figure 2: GelSLAM pipeline: GelSight image streams are first processed by the tracking module to estimate object poses and select keyframes along the trajectory. Each new keyframe is passed to the loop closure module to identify revisits (loops). A globally consistent trajectory is then computed by optimizing a pose graph that combines tracking and loop information. The reconstruction module registers local tactile patches using the optimized poses and fuses them into the final 3D model.
  • Figure 3: Loop detection pipeline. SIFT features on curvature maps (second column) are first matched to estimate an initial relative transformation, which is then refined by NormalFlow (third column). If the warped curvature maps (fourth column) are well aligned and the shared contact region is sufficiently large, the loop is accepted. This verification is performed by NormalFlow’s failure detection process, which evaluates the CCS and SCR scores.
  • Figure 4: Reconstruction pipeline. Naively pasting tactile meshes at their estimated poses can introduce artifacts (naive fusion). We address this with a fast fusion step, which can run in real time to provide online reconstruction feedback, followed by an offline re-meshing step to produce a watertight final model.
  • Figure 5: Tracking data collection setup: the object is clamped to the table, and the GelSight sensor is tracked using MoCap.
  • ...and 16 more figures