Table of Contents
Fetching ...

Realistic Surgical Image Dataset Generation Based On 3D Gaussian Splatting

Tianle Zeng, Gerardo Loza Galindo, Junlei Hu, Pietro Valdastri, Dominic Jones

TL;DR

This work addresses the data scarcity challenge in RAMIS by introducing a 3D Gaussian Splatting pipeline to generate high-fidelity synthetic surgical images. The method separately models background scenes and instruments as 3D Gaussians, allows precise instrument editing and scene fusion, and automatically produces pixel-perfect annotations through a differentiable rendering process. Experimental results show the synthetic data achieves high image quality (PSNR $\approx$27–29) and, when used to train YOLOv5, yields a 12% improvement over models trained on real GT data on unseen real-world images. The approach offers a scalable path to augment surgical datasets with accurate annotations, potentially accelerating AI-assisted RAMIS development, though current work focuses on static scenes. Overall, this study demonstrates the viability of 3D Gaussian Splatting as a practical alternative to NeRF-based methods for medical dataset generation.

Abstract

Computer vision technologies markedly enhance the automation capabilities of robotic-assisted minimally invasive surgery (RAMIS) through advanced tool tracking, detection, and localization. However, the limited availability of comprehensive surgical datasets for training represents a significant challenge in this field. This research introduces a novel method that employs 3D Gaussian Splatting to generate synthetic surgical datasets. We propose a method for extracting and combining 3D Gaussian representations of surgical instruments and background operating environments, transforming and combining them to generate high-fidelity synthetic surgical scenarios. We developed a data recording system capable of acquiring images alongside tool and camera poses in a surgical scene. Using this pose data, we synthetically replicate the scene, thereby enabling direct comparisons of the synthetic image quality (29.592 PSNR). As a further validation, we compared two YOLOv5 models trained on the synthetic and real data, respectively, and assessed their performance in an unseen real-world test dataset. Comparing the performances, we observe an improvement in neural network performance, with the synthetic-trained model outperforming the real-world trained model by 12%, testing both on real-world data.

Realistic Surgical Image Dataset Generation Based On 3D Gaussian Splatting

TL;DR

This work addresses the data scarcity challenge in RAMIS by introducing a 3D Gaussian Splatting pipeline to generate high-fidelity synthetic surgical images. The method separately models background scenes and instruments as 3D Gaussians, allows precise instrument editing and scene fusion, and automatically produces pixel-perfect annotations through a differentiable rendering process. Experimental results show the synthetic data achieves high image quality (PSNR 27–29) and, when used to train YOLOv5, yields a 12% improvement over models trained on real GT data on unseen real-world images. The approach offers a scalable path to augment surgical datasets with accurate annotations, potentially accelerating AI-assisted RAMIS development, though current work focuses on static scenes. Overall, this study demonstrates the viability of 3D Gaussian Splatting as a practical alternative to NeRF-based methods for medical dataset generation.

Abstract

Computer vision technologies markedly enhance the automation capabilities of robotic-assisted minimally invasive surgery (RAMIS) through advanced tool tracking, detection, and localization. However, the limited availability of comprehensive surgical datasets for training represents a significant challenge in this field. This research introduces a novel method that employs 3D Gaussian Splatting to generate synthetic surgical datasets. We propose a method for extracting and combining 3D Gaussian representations of surgical instruments and background operating environments, transforming and combining them to generate high-fidelity synthetic surgical scenarios. We developed a data recording system capable of acquiring images alongside tool and camera poses in a surgical scene. Using this pose data, we synthetically replicate the scene, thereby enabling direct comparisons of the synthetic image quality (29.592 PSNR). As a further validation, we compared two YOLOv5 models trained on the synthetic and real data, respectively, and assessed their performance in an unseen real-world test dataset. Comparing the performances, we observe an improvement in neural network performance, with the synthetic-trained model outperforming the real-world trained model by 12%, testing both on real-world data.
Paper Structure (12 sections, 5 equations, 4 figures, 2 tables)

This paper contains 12 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Pipeline of proposed method.
  • Figure 2: Using circular sampling, we center the surgical tool in the scene, resulting in a dense distribution at the center (highlighted by a red rectangle).
  • Figure 3: Our dataset recording platform.
  • Figure 4: Illutsration of GT and Synthetic pairs for the most and least similar image, indicating the regions of increased difference