Table of Contents
Fetching ...

WaterSplat-SLAM: Photorealistic Monocular SLAM in Underwater Environment

Kangxu Wang, Shaofeng Zou, Chenxing Jiang, Yixiang Dai, Siang Chen, Shaojie Shen, Guijin Wang

Abstract

Underwater monocular SLAM is a challenging problem with applications from autonomous underwater vehicles to marine archaeology. However, existing underwater SLAM methods struggle to produce maps with high-fidelity rendering. In this paper, we propose WaterSplat-SLAM, a novel monocular underwater SLAM system that achieves robust pose estimation and photorealistic dense mapping. Specifically, we couple semantic medium filtering into two-view 3D reconstruction prior to enable underwater-adapted camera tracking and depth estimation. Furthermore, we present a semantic-guided rendering and adaptive map management strategy with an online medium-aware Gaussian map, modeling underwater environment in a photorealistic and compact manner. Experiments on multiple underwater datasets demonstrate that WaterSplat-SLAM achieves robust camera tracking and high-fidelity rendering in underwater environments.

WaterSplat-SLAM: Photorealistic Monocular SLAM in Underwater Environment

Abstract

Underwater monocular SLAM is a challenging problem with applications from autonomous underwater vehicles to marine archaeology. However, existing underwater SLAM methods struggle to produce maps with high-fidelity rendering. In this paper, we propose WaterSplat-SLAM, a novel monocular underwater SLAM system that achieves robust pose estimation and photorealistic dense mapping. Specifically, we couple semantic medium filtering into two-view 3D reconstruction prior to enable underwater-adapted camera tracking and depth estimation. Furthermore, we present a semantic-guided rendering and adaptive map management strategy with an online medium-aware Gaussian map, modeling underwater environment in a photorealistic and compact manner. Experiments on multiple underwater datasets demonstrate that WaterSplat-SLAM achieves robust camera tracking and high-fidelity rendering in underwater environments.

Paper Structure

This paper contains 17 sections, 19 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: System overview of WaterSplat-SLAM: The system takes an RGB sequence as input and generates an online medium-aware Gaussian map. The RGB sequence is input into the segmentation model to generate semantic segmentation of images. The object regions are then fed into the Camera Tracking and Pointmap Estimation module to generate keyframe poses. Then, Gaussian primitives are initialized through keyframe poses and object parts. For each encoded ray vector, the Medium Network predicts the medium parameters. The Gaussian map is optimized using a semantic-guided photometric loss. Upon loop closure, we also perform adaptive adjustment and merging of Gaussian primitives.
  • Figure 2: Illustration of medium-aware Gaussian mapping: Encoded ray vectors are passed through the Medium Network to predict three medium attributes: $\sigma^{\text{attn}}$ (attenuation density), $\sigma^{\text{bs}}$ (backscatter density), and $c^{\text{med}}$ (medium color). During rendering, the contributions of the medium and objects are explicitly separated.
  • Figure 3: Gaussian primitives merging pipeline: When consecutive keyframes $k, k+1$ detect loop closure with history frame $j$, we extract their anchored Gaussian primitives and establish a 3D voxel grid. All primitives falling within the same voxel are then merged into a single Gaussian primitive.
  • Figure 4: Detailed comparison of reconstruction results for Curacao, JapRedSea, and Panama sequences on SeaThru-NeRF dataset. All three sequences exhibit the common challenge of turbid water and significant obscureness. We select a non-keyframe which is not included in training views across all methods in three sequences. Specific regions are zoomed in to highlight reconstruction details. WaterSplat-SLAM shows high-fidelity reconstruction and details, demonstrating its strong capability in clearly rendering both foreground and background objects despite the challenging underwater conditions.
  • Figure 5: Detailed reconstruction comparisons for Big_gate and Pipe_local and Pool_up2 sequences on our dataset. We select a non-keyframe which is not included in training views for all methods across three sequences. Specific regions are zoomed in to highlight reconstruction details.
  • ...and 1 more figures