Table of Contents
Fetching ...

Endo-G$^{2}$T: Geometry-Guided & Temporally Aware Time-Embedded 4DGS For Endoscopic Scenes

Yangle Liu, Fengze Li, Kan Liu, Jieming Ma

TL;DR

Endo-G$^{2}$T, a geometry-guided and temporally aware training scheme for time-embedded 4D Gaussian splatting (4DGS) achieves state-of-the-art results among monocular reconstruction baselines.

Abstract

Endoscopic (endo) video exhibits strong view-dependent effects such as specularities, wet reflections, and occlusions. Pure photometric supervision misaligns with geometry and triggers early geometric drift, where erroneous shapes are reinforced during densification and become hard to correct. We ask how to anchor geometry early for 4D Gaussian splatting (4DGS) while maintaining temporal consistency and efficiency in dynamic endoscopic scenes. Thus, we present Endo-G$^{2}$T, a geometry-guided and temporally aware training scheme for time-embedded 4DGS. First, geo-guided prior distillation converts confidence-gated monocular depth into supervision with scale-invariant depth and depth-gradient losses, using a warm-up-to-cap schedule to inject priors softly and avoid early overfitting. Second, a time-embedded Gaussian field represents dynamics in XYZT with a rotor-like rotation parameterization, yielding temporally coherent geometry with lightweight regularization that favors smooth motion and crisp opacity boundaries. Third, keyframe-constrained streaming improves efficiency and long-horizon stability through keyframe-focused optimization under a max-points budget, while non-keyframes advance with lightweight updates. Across EndoNeRF and StereoMIS-P1 datasets, Endo-G$^{2}$T achieves state-of-the-art results among monocular reconstruction baselines.

Endo-G$^{2}$T: Geometry-Guided & Temporally Aware Time-Embedded 4DGS For Endoscopic Scenes

TL;DR

Endo-GT, a geometry-guided and temporally aware training scheme for time-embedded 4D Gaussian splatting (4DGS) achieves state-of-the-art results among monocular reconstruction baselines.

Abstract

Endoscopic (endo) video exhibits strong view-dependent effects such as specularities, wet reflections, and occlusions. Pure photometric supervision misaligns with geometry and triggers early geometric drift, where erroneous shapes are reinforced during densification and become hard to correct. We ask how to anchor geometry early for 4D Gaussian splatting (4DGS) while maintaining temporal consistency and efficiency in dynamic endoscopic scenes. Thus, we present Endo-GT, a geometry-guided and temporally aware training scheme for time-embedded 4DGS. First, geo-guided prior distillation converts confidence-gated monocular depth into supervision with scale-invariant depth and depth-gradient losses, using a warm-up-to-cap schedule to inject priors softly and avoid early overfitting. Second, a time-embedded Gaussian field represents dynamics in XYZT with a rotor-like rotation parameterization, yielding temporally coherent geometry with lightweight regularization that favors smooth motion and crisp opacity boundaries. Third, keyframe-constrained streaming improves efficiency and long-horizon stability through keyframe-focused optimization under a max-points budget, while non-keyframes advance with lightweight updates. Across EndoNeRF and StereoMIS-P1 datasets, Endo-GT achieves state-of-the-art results among monocular reconstruction baselines.

Paper Structure

This paper contains 7 sections, 12 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of ENDO-G2T.
  • Figure 2: Comparison on EndoNeRF Cutting/Pulling: Endo-G2T produces sharper tissue boundaries and fewer floaters than other baselines.
  • Figure 3: Endo-G2T on EndoNeRF Cutting/Pulling from $t{=}0$ to $t{=}0.9$: GT, our renders, and reconstructed surfaces, showing coherent geometry and crisp opacity.