Table of Contents
Fetching ...

Diff2DGS: Reliable Reconstruction of Occluded Surgical Scenes via 2D Gaussian Splatting

Tianyi Song, Danail Stoyanov, Evangelos Mazomenos, Francisco Vasconcelos

TL;DR

Diff2DGS tackles occlusion-aware real-time reconstruction of deformable surgical scenes by coupling a diffusion-based instrument inpainting stage with a Learnable Deformation Model-enhanced 2D Gaussian Splatting pipeline. It introduces an adaptive depth loss to dynamically balance appearance and geometry during training, leading to improved 3D fidelity alongside high-quality rendering. Evaluations on EndoNeRF, StereoMIS, and SCARED show state-of-the-art PSNR and depth metrics, confirming that optimizing image quality alone does not guarantee accurate geometry. The framework achieves real-time rendering and robust occlusion handling, advancing intraoperative guidance and automation capabilities; future work will extend robustness to camera motion.

Abstract

Real-time reconstruction of deformable surgical scenes is vital for advancing robotic surgery, improving surgeon guidance, and enabling automation. Recent methods achieve dense reconstructions from da Vinci robotic surgery videos, with Gaussian Splatting (GS) offering real-time performance via graphics acceleration. However, reconstruction quality in occluded regions remains limited, and depth accuracy has not been fully assessed, as benchmarks like EndoNeRF and StereoMIS lack 3D ground truth. We propose Diff2DGS, a novel two-stage framework for reliable 3D reconstruction of occluded surgical scenes. In the first stage, a diffusion-based video module with temporal priors inpaints tissue occluded by instruments with high spatial-temporal consistency. In the second stage, we adapt 2D Gaussian Splatting (2DGS) with a Learnable Deformation Model (LDM) to capture dynamic tissue deformation and anatomical geometry. We also extend evaluation beyond prior image-quality metrics by performing quantitative depth accuracy analysis on the SCARED dataset. Diff2DGS outperforms state-of-the-art approaches in both appearance and geometry, reaching 38.02 dB PSNR on EndoNeRF and 34.40 dB on StereoMIS. Furthermore, our experiments demonstrate that optimizing for image quality alone does not necessarily translate into optimal 3D reconstruction accuracy. To address this, we further optimize the depth quality of the reconstructed 3D results, ensuring more faithful geometry in addition to high-fidelity appearance.

Diff2DGS: Reliable Reconstruction of Occluded Surgical Scenes via 2D Gaussian Splatting

TL;DR

Diff2DGS tackles occlusion-aware real-time reconstruction of deformable surgical scenes by coupling a diffusion-based instrument inpainting stage with a Learnable Deformation Model-enhanced 2D Gaussian Splatting pipeline. It introduces an adaptive depth loss to dynamically balance appearance and geometry during training, leading to improved 3D fidelity alongside high-quality rendering. Evaluations on EndoNeRF, StereoMIS, and SCARED show state-of-the-art PSNR and depth metrics, confirming that optimizing image quality alone does not guarantee accurate geometry. The framework achieves real-time rendering and robust occlusion handling, advancing intraoperative guidance and automation capabilities; future work will extend robustness to camera motion.

Abstract

Real-time reconstruction of deformable surgical scenes is vital for advancing robotic surgery, improving surgeon guidance, and enabling automation. Recent methods achieve dense reconstructions from da Vinci robotic surgery videos, with Gaussian Splatting (GS) offering real-time performance via graphics acceleration. However, reconstruction quality in occluded regions remains limited, and depth accuracy has not been fully assessed, as benchmarks like EndoNeRF and StereoMIS lack 3D ground truth. We propose Diff2DGS, a novel two-stage framework for reliable 3D reconstruction of occluded surgical scenes. In the first stage, a diffusion-based video module with temporal priors inpaints tissue occluded by instruments with high spatial-temporal consistency. In the second stage, we adapt 2D Gaussian Splatting (2DGS) with a Learnable Deformation Model (LDM) to capture dynamic tissue deformation and anatomical geometry. We also extend evaluation beyond prior image-quality metrics by performing quantitative depth accuracy analysis on the SCARED dataset. Diff2DGS outperforms state-of-the-art approaches in both appearance and geometry, reaching 38.02 dB PSNR on EndoNeRF and 34.40 dB on StereoMIS. Furthermore, our experiments demonstrate that optimizing for image quality alone does not necessarily translate into optimal 3D reconstruction accuracy. To address this, we further optimize the depth quality of the reconstructed 3D results, ensuring more faithful geometry in addition to high-fidelity appearance.
Paper Structure (12 sections, 19 equations, 6 figures, 5 tables)

This paper contains 12 sections, 19 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Traditional endoscopic scene reconstruction methods often focus solely on image quality from the camera viewpoint, neglecting the depth accuracy of the reconstruction results. Consequently, when the camera viewpoint changes, reconstruction accuracy degrades significantly. We present Diff2DGS, a framework that effectively balances depth information and image quality, achieving high-quality 3D reconstruction while more effectively eliminating artifacts in occluded regions and providing more accurate depth estimation.
  • Figure 2: Our reliable surgical scene reconstruction framework, Diff2DGS, consists of Surgical Instrument Inpainting, Point Cloud Initialization, Deformation Modeling, and 2D Gaussian Splatting.
  • Figure 3: Visualization of the 3D reconstruction results.
  • Figure 4: Depth Quality Visualization on SCARED Dataset
  • Figure 5: Visual comparison on SCARED dataset
  • ...and 1 more figures