Efficient Visuo-Haptic Object Shape Completion for Robot Manipulation

Lukas Rustler; Jiri Matas; Matej Hoffmann

Efficient Visuo-Haptic Object Shape Completion for Robot Manipulation

Lukas Rustler, Jiri Matas, Matej Hoffmann

TL;DR

VISHAC tackles the challenge of obtaining complete object shapes for robotic grasping by integrating visual data with exploratory haptics in a closed-loop pipeline. It extends implicit surface learning (IGR) with a theoretically grounded uncertainty-driven touch strategy, free-space modeling, and multi-object handling, achieving faster and more accurate reconstructions than prior baselines. The approach yields tangible gains in grasp success, from 63.3% to 70.4% after one touch and up to 82.7% after five touches, and demonstrates both simulation and real-world viability with multiple objects and challenging objects like transparent bottles. By representing shapes as neural implicit surfaces and continuously updating pose and data after each interaction, VISHAC enables robust manipulation in cluttered and dynamic scenes, with publicly available data and code for reproducibility and extension. The results underscore the value of coupling uncertainty-based tactile exploration with high-fidelity implicit representations for practical robotic grasping applications.

Abstract

For robot manipulation, a complete and accurate object shape is desirable. Here, we present a method that combines visual and haptic reconstruction in a closed-loop pipeline. From an initial viewpoint, the object shape is reconstructed using an implicit surface deep neural network. The location with highest uncertainty is selected for haptic exploration, the object is touched, the new information from touch and a new point cloud from the camera are added, object position is re-estimated and the cycle is repeated. We extend Rustler et al. (2022) by using a new theoretically grounded method to determine the points with highest uncertainty, and we increase the yield of every haptic exploration by adding not only the contact points to the point cloud but also incorporating the empty space established through the robot movement to the object. Additionally, the solution is compact in that the jaws of a closed two-finger gripper are directly used for exploration. The object position is re-estimated after every robot action and multiple objects can be present simultaneously on the table. We achieve a steady improvement with every touch using three different metrics and demonstrate the utility of the better shape reconstruction in grasping experiments on the real robot. On average, grasp success rate increases from 63.3% to 70.4% after a single exploratory touch and to 82.7% after five touches. The collected data and code are publicly available (https://osf.io/j6rkd/, https://github.com/ctu-vras/vishac)

Efficient Visuo-Haptic Object Shape Completion for Robot Manipulation

TL;DR

Abstract

Paper Structure (23 sections, 8 equations, 9 figures, 2 algorithms)

This paper contains 23 sections, 8 equations, 9 figures, 2 algorithms.

Introduction
Related work
Visual-only Shape Completion
Haptic-Only Shape Completion
Visuo-Haptic Shape Completion
Method
Implicit Surfaces
Implicit Geometric Regularization for Learning Shapes
IGR Modifications -- Sampling and Free Space
Object Representation from Visual and Haptic Data
Object Shape Uncertainty
Segmentation of Multiple Objects
Pose Estimation
VISHAC Algorithm
Experiments and Results
...and 8 more sections

Figures (9)

Figure 1: Schematic operation of VISHAC. An initial RGB-D image of the scene is captured (1), a transformation $\mathbf{R}_0$ from the robot base to the object is obtained, and the object is segmented and converted into a point cloud $\mathcal{X}$ (2). Iterative reconstruction: In each step, $n=0:(N-1)$, the point cloud is inserted into a neural network (3) and a completed shape $\mathbf{O}_n$ is created (4). The most uncertain point $\mathbf{p}_n$ is selected for touch (5). After contact, the object may have been displaced, giving rise to a new transformation $\mathbf{R}_n$. Haptic data $\mathbf{h}_n$ (6) from contact and visual data by taking a new image from the RGB-D camera $\mathbf{v}$ (7) are collected. The transformation $\mathbf{R}_n$ is computed from pose estimation (8) and the new data, transformed into the original frame $\mathbf{R}_0$, are added to $\mathcal{X}$. See Sec. \ref{['sec:algo']} for details.
Figure 2: The real-world robot setup with Kinova Gen3 robot, Robotiq 2F-85 gripper, external RGB-D camera and all objects used. Closed gripper was used for haptic exploration, open for grasping.
Figure 3: Simulation -- reconstruction -- 1 object in scene. Average reconstruction accuracy (8 objects, 3 repetitions each). Numbers in each datapoint -- number of touches. Shaded areas -- standard deviation. js higher values better. cd lower values better.
Figure 4: Simulation and real experiments. Mean area of meshes. Numbers in each datapoint -- number of touches. Single -- scenes with only single objects; multi -- scenes with more objects. Act-VH is a baseline from Rustler2022 and Act-VH - new data is the same method evaluated on data collected in this work.
Figure 5: Simulation -- reconstruction -- multiple objects in scene. Average reconstruction accuracy (5 scenes, 3 repetitions each). Numbers in each datapoint -- number of touches. Shaded areas -- standard deviation. js higher values better. cd lower values better.
...and 4 more figures

Efficient Visuo-Haptic Object Shape Completion for Robot Manipulation

TL;DR

Abstract

Efficient Visuo-Haptic Object Shape Completion for Robot Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)