SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models
Sabina Martyniak, Joanna Kaleta, Diego Dall'Alba, Michał Naskręt, Szymon Płotka, Przemysław Korzeniowski
TL;DR
SimuScope tackles the data bottleneck in computer-assisted surgery by integrating a high-fidelity surgical simulator that automatically generates rich annotations with a diffusion-model pipeline for photorealistic endoscopic imagery. It fine-tunes Stable Diffusion using two LoRA adapters (CholectG45 and CholectL45) and employs ControlNet++ conditioning (SoftEdge, Depth, Reference) to produce realistic images that preserve the simulator-derived labels. Quantitative results show substantial improvements over the raw simulator across $\text{mIoU}$, $\text{FID}$, $\text{KID}$, and other fidelity/diversity metrics, while enabling efficient training data generation. The approach promises practical impact for CAS by delivering large, richly labeled, realistic datasets, though temporal coherence in video remains an open area for future work.
Abstract
Computer-assisted surgical (CAS) systems enhance surgical execution and outcomes by providing advanced support to surgeons. These systems often rely on deep learning models trained on complex, challenging-to-annotate data. While synthetic data generation can address these challenges, enhancing the realism of such data is crucial. This work introduces a multi-stage pipeline for generating realistic synthetic data, featuring a fully-fledged surgical simulator that automatically produces all necessary annotations for modern CAS systems. This simulator generates a wide set of annotations that surpass those available in public synthetic datasets. Additionally, it offers a more complex and realistic simulation of surgical interactions, including the dynamics between surgical instruments and deformable anatomical environments, outperforming existing approaches. To further bridge the visual gap between synthetic and real data, we propose a lightweight and flexible image-to-image translation method based on Stable Diffusion (SD) and Low-Rank Adaptation (LoRA). This method leverages a limited amount of annotated data, enables efficient training, and maintains the integrity of annotations generated by our simulator. The proposed pipeline is experimentally validated and can translate synthetic images into images with real-world characteristics, which can generalize to real-world context, thereby improving both training and CAS guidance. The code and the dataset are available at https://github.com/SanoScience/SimuScope.
