Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction
Kai Li, Junhao Wang, William Han, Ding Zhao
TL;DR
FE-4DGS addresses the challenge of real-time reconstruction and semantic segmentation of deformable surgical scenes by distilling 2D semantic features into a 4D Gaussian Splatting framework. It introduces a Feature-Spatiotemporal deformation module to update per-Gaussian geometry and semantics, and uses differentiable rendering with a semantic alignment loss to fuse geometry with SAM-derived semantics. The approach achieves state-of-the-art rendering fidelity on EndoNeRF and SCARED while delivering real-time performance, and demonstrates strong binary segmentation and competitive multi-label segmentation on EndoVis18. This work enables unified reconstruction and segmentation in MIS, with implications for AR guidance and potential for language-guided editing in future systems.
Abstract
Minimally invasive surgery (MIS) requires high-fidelity, real-time visual feedback of dynamic and low-texture surgical scenes. To address these requirements, we introduce FeatureEndo-4DGS (FE-4DGS), the first real time pipeline leveraging feature-distilled 4D Gaussian Splatting for simultaneous reconstruction and semantic segmentation of deformable surgical environments. Unlike prior feature-distilled methods restricted to static scenes, and existing 4D approaches that lack semantic integration, FE-4DGS seamlessly leverages pre-trained 2D semantic embeddings to produce a unified 4D representation-where semantics also deform with tissue motion. This unified approach enables the generation of real-time RGB and semantic outputs through a single, parallelized rasterization process. Despite the additional complexity from feature distillation, FE-4DGS sustains real-time rendering (61 FPS) with a compact footprint, achieves state-of-the-art rendering fidelity on EndoNeRF (39.1 PSNR) and SCARED (27.3 PSNR), and delivers competitive EndoVis18 segmentation, matching or exceeding strong 2D baselines for binary segmentation tasks (0.93 DSC) and remaining competitive for multi-label segmentation (0.77 DSC).
