Table of Contents
Fetching ...

Sim-To-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery

Paul Maria Scheikl, Eleonora Tagliabue, Balázs Gyenes, Martin Wagner, Diego Dall'Alba, Paolo Fiorini, Franziska Mathis-Ullrich

TL;DR

We address the sim-to-real transfer problem for visuomotor policies in robot-assisted surgery by introducing an image-based reinforcement learning pipeline that uses pixel-level, unpaired image-to-image translation with contrastive learning to bridge the visual domain gap. The policy is trained in a soft-tissue simulation on translated observations and deployed on a real robotic system without retraining, achieving $50\%$ real-world success in a deformable tissue retraction task. Key contributions include demonstrating first visual sim-to-real transfer for deformable surgical manipulation, showing data-efficient translation that avoids task-specific auxiliary tasks, and highlighting the practical challenges of dynamics mismatch and camera calibration for clinical translation. This work advances cognitive surgical robotics by offering a scalable, image-based approach to closing the sim-to-real gap with reduced data requirements and task-agnostic translation techniques.

Abstract

Automation holds the potential to assist surgeons in robotic interventions, shifting their mental work load from visuomotor control to high level decision making. Reinforcement learning has shown promising results in learning complex visuomotor policies, especially in simulation environments where many samples can be collected at low cost. A core challenge is learning policies in simulation that can be deployed in the real world, thereby overcoming the sim-to-real gap. In this work, we bridge the visual sim-to-real gap with an image-based reinforcement learning pipeline based on pixel-level domain adaptation and demonstrate its effectiveness on an image-based task in deformable object manipulation. We choose a tissue retraction task because of its importance in clinical reality of precise cancer surgery. After training in simulation on domain-translated images, our policy requires no retraining to perform tissue retraction with a 50% success rate on the real robotic system using raw RGB images. Furthermore, our sim-to-real transfer method makes no assumptions on the task itself and requires no paired images. This work introduces the first successful application of visual sim-to-real transfer for robotic manipulation of deformable objects in the surgical field, which represents a notable step towards the clinical translation of cognitive surgical robotics.

Sim-To-Real Transfer for Visual Reinforcement Learning of Deformable Object Manipulation for Robot-Assisted Surgery

TL;DR

We address the sim-to-real transfer problem for visuomotor policies in robot-assisted surgery by introducing an image-based reinforcement learning pipeline that uses pixel-level, unpaired image-to-image translation with contrastive learning to bridge the visual domain gap. The policy is trained in a soft-tissue simulation on translated observations and deployed on a real robotic system without retraining, achieving real-world success in a deformable tissue retraction task. Key contributions include demonstrating first visual sim-to-real transfer for deformable surgical manipulation, showing data-efficient translation that avoids task-specific auxiliary tasks, and highlighting the practical challenges of dynamics mismatch and camera calibration for clinical translation. This work advances cognitive surgical robotics by offering a scalable, image-based approach to closing the sim-to-real gap with reduced data requirements and task-agnostic translation techniques.

Abstract

Automation holds the potential to assist surgeons in robotic interventions, shifting their mental work load from visuomotor control to high level decision making. Reinforcement learning has shown promising results in learning complex visuomotor policies, especially in simulation environments where many samples can be collected at low cost. A core challenge is learning policies in simulation that can be deployed in the real world, thereby overcoming the sim-to-real gap. In this work, we bridge the visual sim-to-real gap with an image-based reinforcement learning pipeline based on pixel-level domain adaptation and demonstrate its effectiveness on an image-based task in deformable object manipulation. We choose a tissue retraction task because of its importance in clinical reality of precise cancer surgery. After training in simulation on domain-translated images, our policy requires no retraining to perform tissue retraction with a 50% success rate on the real robotic system using raw RGB images. Furthermore, our sim-to-real transfer method makes no assumptions on the task itself and requires no paired images. This work introduces the first successful application of visual sim-to-real transfer for robotic manipulation of deformable objects in the surgical field, which represents a notable step towards the clinical translation of cognitive surgical robotics.
Paper Structure (20 sections, 5 equations, 8 figures, 1 table)

This paper contains 20 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Experimental setup for tissue retraction. We combine an Intel RealSense camera and the da Vinci Research Kit with a ProGrasp instrument grasping a yellow sheet of silicone attached to a board.
  • Figure 2: Overview of training and evaluation settings. A policy $\pi$ is trained in simulation on translated observations $\hat{o}_r = G(o_s)$. During evaluation on the robotic system, the policy receives real image observations $o_r$. The actions $a$ of the policy are deltas in the gripper's Cartesian coordinates to solve a tissue retraction task.
  • Figure 3: (a) Tissue retraction scene implemented in SOFA with illustrated coordinate system and (b) experimental setup on the real robotic system.
  • Figure 4: Illustrated data flow for the four sim-to-real evaluation scenarios. All policy actions $a$ are executed on the real robot. For scenarios $\pi_s(O_s)$ and $\pi_g(G(O_s))$ the simulation is updated with the real robot states $s$ to generate observations $o_s$ from simulation. Scenarios $\pi_g(O_r)$ and $\pi_g(G(O_r))$ receive observations $o_r$ from the real system.
  • Figure 5: Smoothed learning curves for a training run of $\pi_g$. Average episode return, episode length, and steps in collision and workspace violation are shown over total steps in the learning environment. Three phases are identifiable during learning: when the agent is predominantly learning to grasp (purple), when it is learning to retract (blue), and when it is mainly optimizing to reduce episode length and collisions (green).
  • ...and 3 more figures