EndoGaussian: Real-time Gaussian Splatting for Dynamic Endoscopic Scene Reconstruction
Yifan Liu, Chenxin Li, Chen Yang, Yixuan Yuan
TL;DR
This work tackles the challenge of real-time reconstruction of deformable endoscopic scenes, where prior NeRF-like approaches are too slow for intraoperative use. It introduces EndoGaussian, a real-time reconstruction framework based on 3D Gaussian Splatting (3DGS) that employs Holistic Gaussian Initialization (HGI) to rapidly seed dense Gaussians from depth predictions and Spatio-temporal Gaussian Tracking (SGT) to model tissue deformation with a lightweight HexPlane-based encoding voxel and a small deformation decoder. The method achieves state-of-the-art reconstruction quality with real-time rendering at ~195 FPS and ~37.85 PSNR, while requiring only about 2 minutes of training per scene and ~2 GB of GPU memory, representing a 100× speedup over existing methods. This enables practical intraoperative applications in RAMIS, providing surgeons with real-time, geometry-consistent visualizations and paving the way for real-time surgical scene understanding and assistance.
Abstract
Reconstructing deformable tissues from endoscopic videos is essential in many downstream surgical applications. However, existing methods suffer from slow rendering speed, greatly limiting their practical use. In this paper, we introduce EndoGaussian, a real-time endoscopic scene reconstruction framework built on 3D Gaussian Splatting (3DGS). By integrating the efficient Gaussian representation and highly-optimized rendering engine, our framework significantly boosts the rendering speed to a real-time level. To adapt 3DGS for endoscopic scenes, we propose two strategies, Holistic Gaussian Initialization (HGI) and Spatio-temporal Gaussian Tracking (SGT), to handle the non-trivial Gaussian initialization and tissue deformation problems, respectively. In HGI, we leverage recent depth estimation models to predict depth maps of input binocular/monocular image sequences, based on which pixels are re-projected and combined for holistic initialization. In SPT, we propose to model surface dynamics using a deformation field, which is composed of an efficient encoding voxel and a lightweight deformation decoder, allowing for Gaussian tracking with minor training and rendering burden. Experiments on public datasets demonstrate our efficacy against prior SOTAs in many aspects, including better rendering speed (195 FPS real-time, 100$\times$ gain), better rendering quality (37.848 PSNR), and less training overhead (within 2 min/scene), showing significant promise for intraoperative surgery applications. Code is available at: \url{https://yifliu3.github.io/EndoGaussian/}.
