Table of Contents
Fetching ...

NRGS-SLAM: Monocular Non-Rigid SLAM for Endoscopy via Deformation-Aware 3D Gaussian Splatting

Jiwei Shan, Zeyu Cai, Yirui Li, Yongbo Chen, Lijun Han, Yun-hui Liu, Hesheng Wang, Shing Shin Cheng

TL;DR

NRGS-SLAM, a monocular non-rigid SLAM system for endoscopy based on 3D Gaussian Splatting, is proposed, which introduces a deformation-aware 3D Gaussian map that augments each Gaussian primitive with a learnable deformation probability, optimized via a Bayesian self-supervision strategy without requiring external non-rigidity labels.

Abstract

Visual simultaneous localization and mapping (V-SLAM) is a fundamental capability for autonomous perception and navigation. However, endoscopic scenes violate the rigidity assumption due to persistent soft-tissue deformations, creating a strong coupling ambiguity between camera ego-motion and intrinsic deformation. Although recent monocular non-rigid SLAM methods have made notable progress, they often lack effective decoupling mechanisms and rely on sparse or low-fidelity scene representations, which leads to tracking drift and limited reconstruction quality. To address these limitations, we propose NRGS-SLAM, a monocular non-rigid SLAM system for endoscopy based on 3D Gaussian Splatting. To resolve the coupling ambiguity, we introduce a deformation-aware 3D Gaussian map that augments each Gaussian primitive with a learnable deformation probability, optimized via a Bayesian self-supervision strategy without requiring external non-rigidity labels. Building on this representation, we design a deformable tracking module that performs robust coarse-to-fine pose estimation by prioritizing low-deformation regions, followed by efficient per-frame deformation updates. A carefully designed deformable mapping module progressively expands and refines the map, balancing representational capacity and computational efficiency. In addition, a unified robust geometric loss incorporates external geometric priors to mitigate the inherent ill-posedness of monocular non-rigid SLAM. Extensive experiments on multiple public endoscopic datasets demonstrate that NRGS-SLAM achieves more accurate camera pose estimation (up to 50\% reduction in RMSE) and higher-quality photo-realistic reconstructions than state-of-the-art methods. Comprehensive ablation studies further validate the effectiveness of our key design choices. Source code will be publicly available upon paper acceptance.

NRGS-SLAM: Monocular Non-Rigid SLAM for Endoscopy via Deformation-Aware 3D Gaussian Splatting

TL;DR

NRGS-SLAM, a monocular non-rigid SLAM system for endoscopy based on 3D Gaussian Splatting, is proposed, which introduces a deformation-aware 3D Gaussian map that augments each Gaussian primitive with a learnable deformation probability, optimized via a Bayesian self-supervision strategy without requiring external non-rigidity labels.

Abstract

Visual simultaneous localization and mapping (V-SLAM) is a fundamental capability for autonomous perception and navigation. However, endoscopic scenes violate the rigidity assumption due to persistent soft-tissue deformations, creating a strong coupling ambiguity between camera ego-motion and intrinsic deformation. Although recent monocular non-rigid SLAM methods have made notable progress, they often lack effective decoupling mechanisms and rely on sparse or low-fidelity scene representations, which leads to tracking drift and limited reconstruction quality. To address these limitations, we propose NRGS-SLAM, a monocular non-rigid SLAM system for endoscopy based on 3D Gaussian Splatting. To resolve the coupling ambiguity, we introduce a deformation-aware 3D Gaussian map that augments each Gaussian primitive with a learnable deformation probability, optimized via a Bayesian self-supervision strategy without requiring external non-rigidity labels. Building on this representation, we design a deformable tracking module that performs robust coarse-to-fine pose estimation by prioritizing low-deformation regions, followed by efficient per-frame deformation updates. A carefully designed deformable mapping module progressively expands and refines the map, balancing representational capacity and computational efficiency. In addition, a unified robust geometric loss incorporates external geometric priors to mitigate the inherent ill-posedness of monocular non-rigid SLAM. Extensive experiments on multiple public endoscopic datasets demonstrate that NRGS-SLAM achieves more accurate camera pose estimation (up to 50\% reduction in RMSE) and higher-quality photo-realistic reconstructions than state-of-the-art methods. Comprehensive ablation studies further validate the effectiveness of our key design choices. Source code will be publicly available upon paper acceptance.
Paper Structure (56 sections, 39 equations, 13 figures, 10 tables)

This paper contains 56 sections, 39 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: (a) Illustration of the Coupling Ambiguity: In monocular non-rigid scenarios, pixel displacement arise from a complex combination of camera ego-motion and intrinsic tissue deformation. This creates a fundamental coupling ambiguity, making it difficult to distinguish rigid motion from non-rigid dynamics or their combination. (b) Existing methods often struggle to effectively decouple these factors during joint optimization and rely on sparse representations (e.g., points or meshes), resulting in tracking drift and low visual fidelity. (c) Our proposed NRGS-SLAM introduces a deformation-aware 3D Gaussian map. By learning a per-primitive deformation probability (visualized by the color spectrum ranging from blue (rigid) to red (non-rigid)), our system explicitly decouples camera tracking from deformation updates, enabling both accurate pose estimation and high-fidelity photo-realistic reconstruction.
  • Figure 2: Overview of our proposed NRGS-SLAM. Central to our approach is the deformation-aware 3D Gaussian map (see Sec. \ref{['sec:map']}), which represents the scene using canonical 3D Gaussians augmented by deformation probabilities and models temporal deformations via Gaussian Basis Functions. Given sequential RGB captured in a deformable environment, the measurement preprocessing module (see Sec. \ref{['sec:mea']}) extracts geometric priors and generates valid and co-visibility masks. Subsequently, deformable tracking (see Sec. \ref{['sec:tracking']}) performs frame-by-frame estimation of camera poses and scene deformations. Upon the insertion of a new keyframe, the deformable mapping module (see Sec. \ref{['sec:mapping']}) is triggered to expand and globally optimize the map.
  • Figure 3: Left: The canonical space is populated with 3D Gaussian primitives augmented by a learnable deformation probability $w_d$. Primitives in rigid regions (blue, $w_d \to 0$) remain static, while those in deformable tissues (red, $w_d \to 1$) are allowed to deform over time. Middle: These probabilistic attributes are projected via differentiable rasterization. Right: The resulting 2D deformation confidence map $\mathbf{M}_{\text{def}}$ visualizes the pixel-wise confidence of scene non-rigidity. It serves as a crucial weighting mask to decouple camera ego-motion from intrinsic scene deformation during tracking and mapping.
  • Figure 4: Overview of the measurement preprocessing pipeline. The module processes the incoming frame $I_t$ to extract geometric cues via a geometric foundation model (left) and determines valid and co-visibility masks (right).
  • Figure 5: Overview of the deformable tracking pipeline. Top: Deformation-Aware Camera Tracking estimates the camera pose in a coarse-to-fine manner. The first stage computes an initial pose using deformation-filtered 3D-2D correspondences. The second stage refines this pose by aligning rendered frame and geometries with current observations and geometric priors; crucially, this optimization is weighted by the rendered deformation probability map to prioritize reliable rigid regions. Bottom: Per-frame Deformation Estimation updates the deformation field to match the input frame. Guided by the deformation probabilities, we employ an efficient residual-based optimization to selectively capture non-rigid variations.
  • ...and 8 more figures