Table of Contents
Fetching ...

SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting

Zhuodong Jiang, Haoran Wang, Guoxi Huang, Brett Seymour, Nantheera Anantrasirichai

TL;DR

Underwater 3D reconstruction suffers from light attenuation and scattering, hindering fidelity and semantic coherence. The paper introduces SWAGSplatting, a semantic-guided 3D Gaussian Splatting framework where every Gaussian carries a learnable semantic feature $f_s$ that is aligned with region-level CLIP embeddings, supervised by a semantic consistency loss. It also features a stage-wise optimization schedule and a Gaussian primitives reallocation strategy to balance the point cloud and boost detail in high-error areas. Across SeaThru-NeRF and Submerged3D, SWAGSplatting achieves up to 3.48 dB PSNR improvement and consistent gains in SSIM and LPIPS, demonstrating more accurate, semantically coherent underwater reconstructions with practical implications for marine perception. These contributions collectively push toward object-aware, robust underwater neural rendering with improved visual fidelity.

Abstract

Accurate 3D reconstruction in underwater environments remains a challenging task due to light attenuation, scattering, and limited visibility. While recent AI-based approaches have advanced underwater imaging, they often overlook high-level semantic understanding, which is crucial for reconstructing complex scenes. In this paper, we propose SWAGSplatting, \textit{Semantic-guided Water-scene Augmented Gaussian Splatting}, a novel multimodal framework that integrates language and vision knowledge into 3D Gaussian Splatting for robust and high-fidelity underwater reconstruction. Each Gaussian primitive is augmented with a learnable semantic feature, supervised using CLIP-based embeddings extracted from region-level semantic cues. A dedicated semantic consistency loss enforces alignment between geometric reconstruction and scene semantics. In addition, a stage-wise optimisation strategy combining coarse-to-fine learning with late-stage parameter refinement improves training stability and visual quality. Furthermore, we propose a 3D Gaussian Primitives Reallocation strategy to address the imbalanced distribution of primitives introduced by naive point cloud densification. Extensive experiments on the SeaThru-NeRF and Submerged3D datasets demonstrate that SWAGSplatting consistently outperforms state-of-the-art methods across PSNR, SSIM, and LPIPS metrics, achieving up to a 3.48 dB improvement in PSNR, enabling more accurate and semantically coherent underwater scene reconstruction for applications in marine perception and exploration.

SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting

TL;DR

Underwater 3D reconstruction suffers from light attenuation and scattering, hindering fidelity and semantic coherence. The paper introduces SWAGSplatting, a semantic-guided 3D Gaussian Splatting framework where every Gaussian carries a learnable semantic feature that is aligned with region-level CLIP embeddings, supervised by a semantic consistency loss. It also features a stage-wise optimization schedule and a Gaussian primitives reallocation strategy to balance the point cloud and boost detail in high-error areas. Across SeaThru-NeRF and Submerged3D, SWAGSplatting achieves up to 3.48 dB PSNR improvement and consistent gains in SSIM and LPIPS, demonstrating more accurate, semantically coherent underwater reconstructions with practical implications for marine perception. These contributions collectively push toward object-aware, robust underwater neural rendering with improved visual fidelity.

Abstract

Accurate 3D reconstruction in underwater environments remains a challenging task due to light attenuation, scattering, and limited visibility. While recent AI-based approaches have advanced underwater imaging, they often overlook high-level semantic understanding, which is crucial for reconstructing complex scenes. In this paper, we propose SWAGSplatting, \textit{Semantic-guided Water-scene Augmented Gaussian Splatting}, a novel multimodal framework that integrates language and vision knowledge into 3D Gaussian Splatting for robust and high-fidelity underwater reconstruction. Each Gaussian primitive is augmented with a learnable semantic feature, supervised using CLIP-based embeddings extracted from region-level semantic cues. A dedicated semantic consistency loss enforces alignment between geometric reconstruction and scene semantics. In addition, a stage-wise optimisation strategy combining coarse-to-fine learning with late-stage parameter refinement improves training stability and visual quality. Furthermore, we propose a 3D Gaussian Primitives Reallocation strategy to address the imbalanced distribution of primitives introduced by naive point cloud densification. Extensive experiments on the SeaThru-NeRF and Submerged3D datasets demonstrate that SWAGSplatting consistently outperforms state-of-the-art methods across PSNR, SSIM, and LPIPS metrics, achieving up to a 3.48 dB improvement in PSNR, enabling more accurate and semantically coherent underwater scene reconstruction for applications in marine perception and exploration.

Paper Structure

This paper contains 18 sections, 12 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Reconstruction performance comparison among 3DGS kerbl:3Dgaussians:2023, UW-GS wang2024uw, and the proposed SWAGSplatting. Relative to the 3DGS and UW-GS methods, SWAGSplatting markedly suppresses artefact generation and yields more faithful rendering results.
  • Figure 2: Pipeline of the SWAGSplatting. Yellow highlights indicate the proposed contributions: (1) semantic-guided loss $L_s$ to obtain high-level structure consistency and high fidelity and quality reconstruction; (2) stage-wise optimisation strategy to enhance both training stability and construction quality; (3) 3D Gaussian primitives reallocation balances the point-cloud distribution and improves reconstruction with the same number of primitives.
  • Figure 3: Novel view rendering comparison. The first row shows results from the IUI-Redsea scene from the SeaThru-NeRF dataset, and the second row shows the reconstructed scenes of the Isro from the Submerged3D dataset. The left side of the third row displays the reconstructed scene of the Tokai from the Submerged3D, while the right side shows the Japanese-Redsea.