Table of Contents
Fetching ...

SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models

Sabina Martyniak, Joanna Kaleta, Diego Dall'Alba, Michał Naskręt, Szymon Płotka, Przemysław Korzeniowski

TL;DR

SimuScope tackles the data bottleneck in computer-assisted surgery by integrating a high-fidelity surgical simulator that automatically generates rich annotations with a diffusion-model pipeline for photorealistic endoscopic imagery. It fine-tunes Stable Diffusion using two LoRA adapters (CholectG45 and CholectL45) and employs ControlNet++ conditioning (SoftEdge, Depth, Reference) to produce realistic images that preserve the simulator-derived labels. Quantitative results show substantial improvements over the raw simulator across $\text{mIoU}$, $\text{FID}$, $\text{KID}$, and other fidelity/diversity metrics, while enabling efficient training data generation. The approach promises practical impact for CAS by delivering large, richly labeled, realistic datasets, though temporal coherence in video remains an open area for future work.

Abstract

Computer-assisted surgical (CAS) systems enhance surgical execution and outcomes by providing advanced support to surgeons. These systems often rely on deep learning models trained on complex, challenging-to-annotate data. While synthetic data generation can address these challenges, enhancing the realism of such data is crucial. This work introduces a multi-stage pipeline for generating realistic synthetic data, featuring a fully-fledged surgical simulator that automatically produces all necessary annotations for modern CAS systems. This simulator generates a wide set of annotations that surpass those available in public synthetic datasets. Additionally, it offers a more complex and realistic simulation of surgical interactions, including the dynamics between surgical instruments and deformable anatomical environments, outperforming existing approaches. To further bridge the visual gap between synthetic and real data, we propose a lightweight and flexible image-to-image translation method based on Stable Diffusion (SD) and Low-Rank Adaptation (LoRA). This method leverages a limited amount of annotated data, enables efficient training, and maintains the integrity of annotations generated by our simulator. The proposed pipeline is experimentally validated and can translate synthetic images into images with real-world characteristics, which can generalize to real-world context, thereby improving both training and CAS guidance. The code and the dataset are available at https://github.com/SanoScience/SimuScope.

SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models

TL;DR

SimuScope tackles the data bottleneck in computer-assisted surgery by integrating a high-fidelity surgical simulator that automatically generates rich annotations with a diffusion-model pipeline for photorealistic endoscopic imagery. It fine-tunes Stable Diffusion using two LoRA adapters (CholectG45 and CholectL45) and employs ControlNet++ conditioning (SoftEdge, Depth, Reference) to produce realistic images that preserve the simulator-derived labels. Quantitative results show substantial improvements over the raw simulator across , , , and other fidelity/diversity metrics, while enabling efficient training data generation. The approach promises practical impact for CAS by delivering large, richly labeled, realistic datasets, though temporal coherence in video remains an open area for future work.

Abstract

Computer-assisted surgical (CAS) systems enhance surgical execution and outcomes by providing advanced support to surgeons. These systems often rely on deep learning models trained on complex, challenging-to-annotate data. While synthetic data generation can address these challenges, enhancing the realism of such data is crucial. This work introduces a multi-stage pipeline for generating realistic synthetic data, featuring a fully-fledged surgical simulator that automatically produces all necessary annotations for modern CAS systems. This simulator generates a wide set of annotations that surpass those available in public synthetic datasets. Additionally, it offers a more complex and realistic simulation of surgical interactions, including the dynamics between surgical instruments and deformable anatomical environments, outperforming existing approaches. To further bridge the visual gap between synthetic and real data, we propose a lightweight and flexible image-to-image translation method based on Stable Diffusion (SD) and Low-Rank Adaptation (LoRA). This method leverages a limited amount of annotated data, enables efficient training, and maintains the integrity of annotations generated by our simulator. The proposed pipeline is experimentally validated and can translate synthetic images into images with real-world characteristics, which can generalize to real-world context, thereby improving both training and CAS guidance. The code and the dataset are available at https://github.com/SanoScience/SimuScope.

Paper Structure

This paper contains 13 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Comparison of real images and images generated by SimuScope. The real images used in this comparison are sourced from the CholecT45 dataset cholect45, which comprises authentic surgical footage. Both the real and generated images exhibit comparable coloration and textural details, posing a challenge in distinguishing between them at a glance. This similarity underscores the fidelity and realism achieved by SimuScope in simulating surgical scenarios.
  • Figure 1: SimuScope generated images. A collection of sample generated images from our simulator is presented here, showcasing a diverse range of perspectives. These examples highlight the simulator's capability to produce detailed visualizations from different angles and orientations. The images encompass various anatomical regions and demonstrate the versatility of the simulation in replicating realistic scenarios.
  • Figure 2: An overview of SimuScope fine-tuning and inference stage. SD model undergoes fine-tuning using LoRA, a framework that associates a unique LoRA identifier and weight with the newly integrated cholect45 style. During inference, the enhanced SimuScope leverages three ControlNet/ControlNet++ models for comprehensive conditioning. The raw input sample, along with the prompts 'lora:CholectL45:0.45 cholect45' and 'lora:CholectG45:0.45 cholect45,' is fed into SimuScope. The SoftEdge ControlNet++ processes edges predicted by HED, the Depth ControlNet++ handles depth detected by MiDaS, and the Reference ControlNet utilizes additional real input sample as a reference. This multi-model integration enhances SimuScope's capability to generate realistic surgical simulations enriched with detailed texture, edge, depth, and reference data.
  • Figure 2: SimuScope artifacts. During the generation of images, various artifacts appeared. As shown in the attached images, these mainly include the addition of instruments in parts of the liver or abdominal wall, which can be observed in the first four images from the first row. These artifacts may interfere with accurate interpretation by introducing extraneous elements that do not belong to the actual anatomical structures. The next four images from the second row exhibit artifacts related to color saturation, where abnormal intensities and hues may distort the visual information. These color saturation artifacts can obscure important details and mislead diagnostic assessments by creating false impressions of tissue characteristics.
  • Figure 3: An overview of the stages of a virtual cholecystectomy: (a) the start of the procedure, showing the initial setup with surgical instruments inserted into the abdominal cavity; (b) the dissection of Calot's triangle using a grasper and diathermy hook; (c) the clipping of the cystic duct and artery with a clipping tool; (d) the cutting of the cystic duct and artery with scissors; (e) the dissection of the gallbladder from the liver bed using a hook; and (f) the gallbladder fully dissected and ready for removal from the abdominal cavity.
  • ...and 5 more figures