Table of Contents
Fetching ...

Embedded Image-to-Image Translation for Efficient Sim-to-Real Transfer in Learning-based Robot-Assisted Soft Manipulation

Jacinto Colan, Keisuke Sugita, Ana Davila, Yutaro Yamada, Yasuhisa Hasegawa

TL;DR

This work targets the sim-to-real gap in learning-based robot-assisted soft manipulation for laparoscopic surgery by employing Contrastive Unpaired Translation (CUT) to convert simulated images into realistic ones and to extract embedded representations. A shared encoder across domains enables task-agnostic embeddings (via $\,\hat{z}^l = H(G_{ ext{enc}}^l(\,\hat{y}))$ across $L$ layers and $S$ patches) that feed a tissue manipulation policy, accelerating learning. The approach is validated on a triangulation task using a Softgym/NVIDIA FleX simulation and real-world data, showing that embedded representations yield faster convergence, higher early rewards, and a ~65% success rate, outperforming using original or translated images alone. These results suggest that embedding-based image-to-image transfer provides a robust, efficient pathway for RAMIS autonomy and real-world deployment.

Abstract

Recent advances in robotic learning in simulation have shown impressive results in accelerating learning complex manipulation skills. However, the sim-to-real gap, caused by discrepancies between simulation and reality, poses significant challenges for the effective deployment of autonomous surgical systems. We propose a novel approach utilizing image translation models to mitigate domain mismatches and facilitate efficient robot skill learning in a simulated environment. Our method involves the use of contrastive unpaired Image-to-image translation, allowing for the acquisition of embedded representations from these transformed images. Subsequently, these embeddings are used to improve the efficiency of training surgical manipulation models. We conducted experiments to evaluate the performance of our approach, demonstrating that it significantly enhances task success rates and reduces the steps required for task completion compared to traditional methods. The results indicate that our proposed system effectively bridges the sim-to-real gap, providing a robust framework for advancing the autonomy of surgical robots in minimally invasive procedures.

Embedded Image-to-Image Translation for Efficient Sim-to-Real Transfer in Learning-based Robot-Assisted Soft Manipulation

TL;DR

This work targets the sim-to-real gap in learning-based robot-assisted soft manipulation for laparoscopic surgery by employing Contrastive Unpaired Translation (CUT) to convert simulated images into realistic ones and to extract embedded representations. A shared encoder across domains enables task-agnostic embeddings (via across layers and patches) that feed a tissue manipulation policy, accelerating learning. The approach is validated on a triangulation task using a Softgym/NVIDIA FleX simulation and real-world data, showing that embedded representations yield faster convergence, higher early rewards, and a ~65% success rate, outperforming using original or translated images alone. These results suggest that embedding-based image-to-image transfer provides a robust, efficient pathway for RAMIS autonomy and real-world deployment.

Abstract

Recent advances in robotic learning in simulation have shown impressive results in accelerating learning complex manipulation skills. However, the sim-to-real gap, caused by discrepancies between simulation and reality, poses significant challenges for the effective deployment of autonomous surgical systems. We propose a novel approach utilizing image translation models to mitigate domain mismatches and facilitate efficient robot skill learning in a simulated environment. Our method involves the use of contrastive unpaired Image-to-image translation, allowing for the acquisition of embedded representations from these transformed images. Subsequently, these embeddings are used to improve the efficiency of training surgical manipulation models. We conducted experiments to evaluate the performance of our approach, demonstrating that it significantly enhances task success rates and reduces the steps required for task completion compared to traditional methods. The results indicate that our proposed system effectively bridges the sim-to-real gap, providing a robust framework for advancing the autonomy of surgical robots in minimally invasive procedures.
Paper Structure (16 sections, 9 equations, 12 figures, 1 table)

This paper contains 16 sections, 9 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Simulation of a dummy tissue used for learning-based tissue triangulation, aimed at visualizing the resection path.
  • Figure 2: Image preprocessing for resection path recognition.
  • Figure 3: Verification of resection line is located inside the triangulation area. $P_i$ represents the endpoints $i={1,2}$ of the resection line, $A$, $B$, and $C$ represent the positions of the grippers, and $Q_i$ represents the points projected onto the plane defined by $A$, $B$, and $C$.
  • Figure 4: Experimental setup used for data collection.
  • Figure 5: Examples of source and target domain images. Left. Dummy tissue in simulation. Right. Real-world dummy tissue.
  • ...and 7 more figures