Table of Contents
Fetching ...

SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction

Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo

TL;DR

The paper tackles dynamic surgical scene reconstruction from endoscopic video, where sparse viewpoints and instrument occlusion hinder high-frequency detail with implicit NeRFs. It introduces SurgicalGaussian, a deformable 3D Gaussian Splatting framework that represents the scene with canonical-space Gaussians and a forward-mapping deformation MLP to predict observation-space offsets, decoupling geometry from motion. A depth-mask based initialization strategy (GIDM) provides robust Gaussian placement, and optimization enforces color/depth supervision, SSIM, deformation consistency, and occlusion-aware color regularization via L_pos, L_cov, and L_smooth. The approach delivers high-fidelity tissue rendering at real-time speeds (e.g., FPS at or above 80) with modest GPU usage and demonstrates superior reconstruction quality over state-of-the-art dynamic NeRF-based methods on EndoNeRF and StereoMIS datasets. This work advances robot-assisted surgery by enabling accurate, instrument-free surgical scene reconstruction suitable for simulation, guidance, and training.

Abstract

Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering. In addition, restricted single view perception and occluded instruments also propose special challenges in surgical scene reconstruction. To address these issues, we develop SurgicalGaussian, a deformable 3D Gaussian Splatting method to model dynamic surgical scenes. Our approach models the spatio-temporal features of soft tissues at each time stamp via a forward-mapping deformation MLP and regularization to constrain local 3D Gaussians to comply with consistent movement. With the depth initialization strategy and tool mask-guided training, our method can remove surgical instruments and reconstruct high-fidelity surgical scenes. Through experiments on various surgical videos, our network outperforms existing method on many aspects, including rendering quality, rendering speed and GPU usage. The project page can be found at https://surgicalgaussian.github.io.

SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction

TL;DR

The paper tackles dynamic surgical scene reconstruction from endoscopic video, where sparse viewpoints and instrument occlusion hinder high-frequency detail with implicit NeRFs. It introduces SurgicalGaussian, a deformable 3D Gaussian Splatting framework that represents the scene with canonical-space Gaussians and a forward-mapping deformation MLP to predict observation-space offsets, decoupling geometry from motion. A depth-mask based initialization strategy (GIDM) provides robust Gaussian placement, and optimization enforces color/depth supervision, SSIM, deformation consistency, and occlusion-aware color regularization via L_pos, L_cov, and L_smooth. The approach delivers high-fidelity tissue rendering at real-time speeds (e.g., FPS at or above 80) with modest GPU usage and demonstrates superior reconstruction quality over state-of-the-art dynamic NeRF-based methods on EndoNeRF and StereoMIS datasets. This work advances robot-assisted surgery by enabling accurate, instrument-free surgical scene reconstruction suitable for simulation, guidance, and training.

Abstract

Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering. In addition, restricted single view perception and occluded instruments also propose special challenges in surgical scene reconstruction. To address these issues, we develop SurgicalGaussian, a deformable 3D Gaussian Splatting method to model dynamic surgical scenes. Our approach models the spatio-temporal features of soft tissues at each time stamp via a forward-mapping deformation MLP and regularization to constrain local 3D Gaussians to comply with consistent movement. With the depth initialization strategy and tool mask-guided training, our method can remove surgical instruments and reconstruct high-fidelity surgical scenes. Through experiments on various surgical videos, our network outperforms existing method on many aspects, including rendering quality, rendering speed and GPU usage. The project page can be found at https://surgicalgaussian.github.io.
Paper Structure (10 sections, 9 equations, 2 figures, 2 tables)

This paper contains 10 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Framework of the proposed SurgicalGaussian.
  • Figure 2: Comparison of reconstruction results between our SurgicalGaussian and EndoNeRF endonerf, EndoSurf zha2023endosurf, LerPlane yang2023neural and EndoGaussian endogaussian.