Efficient Visuo-Haptic Object Shape Completion for Robot Manipulation
Lukas Rustler, Jiri Matas, Matej Hoffmann
TL;DR
VISHAC tackles the challenge of obtaining complete object shapes for robotic grasping by integrating visual data with exploratory haptics in a closed-loop pipeline. It extends implicit surface learning (IGR) with a theoretically grounded uncertainty-driven touch strategy, free-space modeling, and multi-object handling, achieving faster and more accurate reconstructions than prior baselines. The approach yields tangible gains in grasp success, from 63.3% to 70.4% after one touch and up to 82.7% after five touches, and demonstrates both simulation and real-world viability with multiple objects and challenging objects like transparent bottles. By representing shapes as neural implicit surfaces and continuously updating pose and data after each interaction, VISHAC enables robust manipulation in cluttered and dynamic scenes, with publicly available data and code for reproducibility and extension. The results underscore the value of coupling uncertainty-based tactile exploration with high-fidelity implicit representations for practical robotic grasping applications.
Abstract
For robot manipulation, a complete and accurate object shape is desirable. Here, we present a method that combines visual and haptic reconstruction in a closed-loop pipeline. From an initial viewpoint, the object shape is reconstructed using an implicit surface deep neural network. The location with highest uncertainty is selected for haptic exploration, the object is touched, the new information from touch and a new point cloud from the camera are added, object position is re-estimated and the cycle is repeated. We extend Rustler et al. (2022) by using a new theoretically grounded method to determine the points with highest uncertainty, and we increase the yield of every haptic exploration by adding not only the contact points to the point cloud but also incorporating the empty space established through the robot movement to the object. Additionally, the solution is compact in that the jaws of a closed two-finger gripper are directly used for exploration. The object position is re-estimated after every robot action and multiple objects can be present simultaneously on the table. We achieve a steady improvement with every touch using three different metrics and demonstrate the utility of the better shape reconstruction in grasping experiments on the real robot. On average, grasp success rate increases from 63.3% to 70.4% after a single exploratory touch and to 82.7% after five touches. The collected data and code are publicly available (https://osf.io/j6rkd/, https://github.com/ctu-vras/vishac)
