Table of Contents
Fetching ...

Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting

Shuojue Yang, Zijian Wu, Mingxuan Hong, Qian Li, Daiyun Shen, Septimiu E. Salcudean, Yueming Jin

TL;DR

The paper tackles the problem of producing photorealistic, controllable 3D assets for surgical instruments from monocular videos to enable Real2Sim in robotic surgery. It introduces Instrument-Splatting, a pipeline that combines 3D Gaussian Splatting with geometry pretraining to bind Gaussians to instrument parts and a pose estimation/tracking framework built on a render-and-compare objective, followed by texture learning to achieve realism. Key contributions include geometry-guided binding of Gaussian points to CAD parts, a semantics-aware Gaussian representation for articulated instruments, robust pose initialization and tracking under large inter-frame motion, and a texture-learning regime that preserves geometry while learning appearance. The approach achieves superior pose accuracy and photorealistic reconstruction on EndoVis and in-house datasets, outperforming state-of-the-art methods and enabling highly realistic digital twins for surgical AI and autonomy applications.

Abstract

Real2Sim is becoming increasingly important with the rapid development of surgical artificial intelligence (AI) and autonomy. In this work, we propose a novel Real2Sim methodology, Instrument-Splatting, that leverages 3D Gaussian Splatting to provide fully controllable 3D reconstruction of surgical instruments from monocular surgical videos. To maintain both high visual fidelity and manipulability, we introduce a geometry pre-training to bind Gaussian point clouds on part mesh with accurate geometric priors and define a forward kinematics to control the Gaussians as flexible as real instruments. Afterward, to handle unposed videos, we design a novel instrument pose tracking method leveraging semantics-embedded Gaussians to robustly refine per-frame instrument poses and joint states in a render-and-compare manner, which allows our instrument Gaussian to accurately learn textures and reach photorealistic rendering. We validated our method on 2 publicly released surgical videos and 4 videos collected on ex vivo tissues and green screens. Quantitative and qualitative evaluations demonstrate the effectiveness and superiority of the proposed method.

Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting

TL;DR

The paper tackles the problem of producing photorealistic, controllable 3D assets for surgical instruments from monocular videos to enable Real2Sim in robotic surgery. It introduces Instrument-Splatting, a pipeline that combines 3D Gaussian Splatting with geometry pretraining to bind Gaussians to instrument parts and a pose estimation/tracking framework built on a render-and-compare objective, followed by texture learning to achieve realism. Key contributions include geometry-guided binding of Gaussian points to CAD parts, a semantics-aware Gaussian representation for articulated instruments, robust pose initialization and tracking under large inter-frame motion, and a texture-learning regime that preserves geometry while learning appearance. The approach achieves superior pose accuracy and photorealistic reconstruction on EndoVis and in-house datasets, outperforming state-of-the-art methods and enabling highly realistic digital twins for surgical AI and autonomy applications.

Abstract

Real2Sim is becoming increasingly important with the rapid development of surgical artificial intelligence (AI) and autonomy. In this work, we propose a novel Real2Sim methodology, Instrument-Splatting, that leverages 3D Gaussian Splatting to provide fully controllable 3D reconstruction of surgical instruments from monocular surgical videos. To maintain both high visual fidelity and manipulability, we introduce a geometry pre-training to bind Gaussian point clouds on part mesh with accurate geometric priors and define a forward kinematics to control the Gaussians as flexible as real instruments. Afterward, to handle unposed videos, we design a novel instrument pose tracking method leveraging semantics-embedded Gaussians to robustly refine per-frame instrument poses and joint states in a render-and-compare manner, which allows our instrument Gaussian to accurately learn textures and reach photorealistic rendering. We validated our method on 2 publicly released surgical videos and 4 videos collected on ex vivo tissues and green screens. Quantitative and qualitative evaluations demonstrate the effectiveness and superiority of the proposed method.

Paper Structure

This paper contains 11 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Instrument-Splatting Pipeline.
  • Figure 2: Forward kinematics of the LND.
  • Figure 3: Overview diagram of Instrument-Splatting methodology.
  • Figure 4: Visualization of the projected CAD mesh with the estimated poses and corresponding original images.
  • Figure 5: Visualization of the Novel-view Rendering Results of Different Methods