Sim-To-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery
Paul Maria Scheikl, Eleonora Tagliabue, Balázs Gyenes, Martin Wagner, Diego Dall'Alba, Paolo Fiorini, Franziska Mathis-Ullrich
TL;DR
We address the sim-to-real transfer problem for visuomotor policies in robot-assisted surgery by introducing an image-based reinforcement learning pipeline that uses pixel-level, unpaired image-to-image translation with contrastive learning to bridge the visual domain gap. The policy is trained in a soft-tissue simulation on translated observations and deployed on a real robotic system without retraining, achieving $50\%$ real-world success in a deformable tissue retraction task. Key contributions include demonstrating first visual sim-to-real transfer for deformable surgical manipulation, showing data-efficient translation that avoids task-specific auxiliary tasks, and highlighting the practical challenges of dynamics mismatch and camera calibration for clinical translation. This work advances cognitive surgical robotics by offering a scalable, image-based approach to closing the sim-to-real gap with reduced data requirements and task-agnostic translation techniques.
Abstract
Automation holds the potential to assist surgeons in robotic interventions, shifting their mental work load from visuomotor control to high level decision making. Reinforcement learning has shown promising results in learning complex visuomotor policies, especially in simulation environments where many samples can be collected at low cost. A core challenge is learning policies in simulation that can be deployed in the real world, thereby overcoming the sim-to-real gap. In this work, we bridge the visual sim-to-real gap with an image-based reinforcement learning pipeline based on pixel-level domain adaptation and demonstrate its effectiveness on an image-based task in deformable object manipulation. We choose a tissue retraction task because of its importance in clinical reality of precise cancer surgery. After training in simulation on domain-translated images, our policy requires no retraining to perform tissue retraction with a 50% success rate on the real robotic system using raw RGB images. Furthermore, our sim-to-real transfer method makes no assumptions on the task itself and requires no paired images. This work introduces the first successful application of visual sim-to-real transfer for robotic manipulation of deformable objects in the surgical field, which represents a notable step towards the clinical translation of cognitive surgical robotics.
