Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration
Lai Wei, Jiahua Ma, Yibo Hu, Ruimao Zhang
TL;DR
The paper introduces SafeDiff, a diffusion-based framework that implicitly calibrates vision-guided robot states with real-time tactile feedback to ensure force safety in door-opening manipulation. It blends Vision-Guided Mapping Modules with Tactile-Guided Calibration Modules in a multi-scale diffusion network to produce safe state sequences conditioned on visual context and force signals. A large multimodal dataset, SafeDoorManip50k, supports training and evaluation, and a novel force-safety benchmark with simulation and real-world experiments demonstrates improved safety, robustness to disturbances, and effective sim-to-real transfer. The work provides a practical, data-driven approach to force-safe manipulation and offers new benchmarks and resources for future research.
Abstract
In dynamic environments, robots often encounter constrained movement trajectories when manipulating objects with specific properties, such as doors. Therefore, applying the appropriate force is crucial to prevent damage to both the robots and the objects. However, current vision-guided robot state generation methods often falter in this regard, as they lack the integration of tactile perception. To tackle this issue, this paper introduces a novel state diffusion framework termed SafeDiff. It generates a prospective state sequence from the current robot state and visual context observation while incorporating real-time tactile feedback to refine the sequence. As far as we know, this is the first study specifically focused on ensuring force safety in robotic manipulation. It significantly enhances the rationality of state planning, and the safe action trajectory is derived from inverse dynamics based on this refined planning. In practice, unlike previous approaches that concatenate visual and tactile data to generate future robot state sequences, our method employs tactile data as a calibration signal to adjust the robot's state within the state space implicitly. Additionally, we've developed a large-scale simulation dataset called SafeDoorManip50k, offering extensive multimodal data to train and evaluate the proposed method. Extensive experiments show that our visual-tactile model substantially mitigates the risk of harmful forces in the door opening, across both simulated and real-world settings.
