Table of Contents
Fetching ...

Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping

Yiming Huang, Beilei Cui, Long Bai, Zhen Chen, Jinlin Wu, Zhen Li, Hongbin Liu, Hongliang Ren

TL;DR

Endo-2DTAM introduces a real-time endoscopic SLAM system that leverages 2D Gaussian Splatting to deliver geometry-accurate reconstructions with high-quality novel-view rendering. By embedding surface normal information into tracking and mapping, and employing pose-consistent keyframe sampling and BA, it addresses multi-view depth inconsistencies inherent in 3DGS-based approaches. The method achieves state-of-the-art depth reconstruction on public endoscopic data ($1.87\pm0.63$ mm RMSE) while maintaining real-time performance, and demonstrates robust visually faithful renderings and accurate surface normals. This work advances intraoperative visualization and has potential to improve surgical navigation and planning in minimally invasive procedures.

Abstract

Simultaneous Localization and Mapping (SLAM) is essential for precise surgical interventions and robotic tasks in minimally invasive procedures. While recent advancements in 3D Gaussian Splatting (3DGS) have improved SLAM with high-quality novel view synthesis and fast rendering, these systems struggle with accurate depth and surface reconstruction due to multi-view inconsistencies. Simply incorporating SLAM and 3DGS leads to mismatches between the reconstructed frames. In this work, we present Endo-2DTAM, a real-time endoscopic SLAM system with 2D Gaussian Splatting (2DGS) to address these challenges. Endo-2DTAM incorporates a surface normal-aware pipeline, which consists of tracking, mapping, and bundle adjustment modules for geometrically accurate reconstruction. Our robust tracking module combines point-to-point and point-to-plane distance metrics, while the mapping module utilizes normal consistency and depth distortion to enhance surface reconstruction quality. We also introduce a pose-consistent strategy for efficient and geometrically coherent keyframe sampling. Extensive experiments on public endoscopic datasets demonstrate that Endo-2DTAM achieves an RMSE of $1.87\pm 0.63$ mm for depth reconstruction of surgical scenes while maintaining computationally efficient tracking, high-quality visual appearance, and real-time rendering. Our code will be released at github.com/lastbasket/Endo-2DTAM.

Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping

TL;DR

Endo-2DTAM introduces a real-time endoscopic SLAM system that leverages 2D Gaussian Splatting to deliver geometry-accurate reconstructions with high-quality novel-view rendering. By embedding surface normal information into tracking and mapping, and employing pose-consistent keyframe sampling and BA, it addresses multi-view depth inconsistencies inherent in 3DGS-based approaches. The method achieves state-of-the-art depth reconstruction on public endoscopic data ( mm RMSE) while maintaining real-time performance, and demonstrates robust visually faithful renderings and accurate surface normals. This work advances intraoperative visualization and has potential to improve surgical navigation and planning in minimally invasive procedures.

Abstract

Simultaneous Localization and Mapping (SLAM) is essential for precise surgical interventions and robotic tasks in minimally invasive procedures. While recent advancements in 3D Gaussian Splatting (3DGS) have improved SLAM with high-quality novel view synthesis and fast rendering, these systems struggle with accurate depth and surface reconstruction due to multi-view inconsistencies. Simply incorporating SLAM and 3DGS leads to mismatches between the reconstructed frames. In this work, we present Endo-2DTAM, a real-time endoscopic SLAM system with 2D Gaussian Splatting (2DGS) to address these challenges. Endo-2DTAM incorporates a surface normal-aware pipeline, which consists of tracking, mapping, and bundle adjustment modules for geometrically accurate reconstruction. Our robust tracking module combines point-to-point and point-to-plane distance metrics, while the mapping module utilizes normal consistency and depth distortion to enhance surface reconstruction quality. We also introduce a pose-consistent strategy for efficient and geometrically coherent keyframe sampling. Extensive experiments on public endoscopic datasets demonstrate that Endo-2DTAM achieves an RMSE of mm for depth reconstruction of surgical scenes while maintaining computationally efficient tracking, high-quality visual appearance, and real-time rendering. Our code will be released at github.com/lastbasket/Endo-2DTAM.

Paper Structure

This paper contains 21 sections, 14 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Reconstruction and Rendering Results. Compared with the 3DGS-based SLAM, our method utilizes 2DGS for geometry-accurate scene representation, producing novel view rendering of high-quality images, view-consistent depth maps, and precise surface normal.
  • Figure 2: Overview of Endo-2DTAM. Our proposed system consists of three modules: the tracking module, the mapping module, and the bundle adjustment. The tracking module takes the incoming RGBD frame as input and tracks the camera pose. Then the frame is added to the candidate list for the pose-consistent keyframe selection. In the mapping module, we first expand 2D Gaussians with the new frame and then update 2D Gaussians with the selected keyframes. The selected keyframes are also used for bundle adjustment for joint optimization of poses and 2D Gaussians.
  • Figure 3: Qualitative Result on C3VD bobrow2023. We compare our method with the SOTA EndoGSLAM wang2024endogslam for dense endoscopic SLAM. Our method generates more robust color and depth reconstruction as shown by results from cecum_t2_b, sigmoid_t2_a. Our method also estimates a more precise trajectory demonstrated by results from cecum_t3_a.
  • Figure 4: Surface Normal Comparison of Mapping Ablation. We compare the rendered surface normal with different modalities as supervisions. Results demonstrate that the combination of all supervision with color, depth, and normal achieves the best quality.