SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting
Zhuodong Jiang, Haoran Wang, Guoxi Huang, Brett Seymour, Nantheera Anantrasirichai
TL;DR
Underwater 3D reconstruction suffers from light attenuation and scattering, hindering fidelity and semantic coherence. The paper introduces SWAGSplatting, a semantic-guided 3D Gaussian Splatting framework where every Gaussian carries a learnable semantic feature $f_s$ that is aligned with region-level CLIP embeddings, supervised by a semantic consistency loss. It also features a stage-wise optimization schedule and a Gaussian primitives reallocation strategy to balance the point cloud and boost detail in high-error areas. Across SeaThru-NeRF and Submerged3D, SWAGSplatting achieves up to 3.48 dB PSNR improvement and consistent gains in SSIM and LPIPS, demonstrating more accurate, semantically coherent underwater reconstructions with practical implications for marine perception. These contributions collectively push toward object-aware, robust underwater neural rendering with improved visual fidelity.
Abstract
Accurate 3D reconstruction in underwater environments remains a challenging task due to light attenuation, scattering, and limited visibility. While recent AI-based approaches have advanced underwater imaging, they often overlook high-level semantic understanding, which is crucial for reconstructing complex scenes. In this paper, we propose SWAGSplatting, \textit{Semantic-guided Water-scene Augmented Gaussian Splatting}, a novel multimodal framework that integrates language and vision knowledge into 3D Gaussian Splatting for robust and high-fidelity underwater reconstruction. Each Gaussian primitive is augmented with a learnable semantic feature, supervised using CLIP-based embeddings extracted from region-level semantic cues. A dedicated semantic consistency loss enforces alignment between geometric reconstruction and scene semantics. In addition, a stage-wise optimisation strategy combining coarse-to-fine learning with late-stage parameter refinement improves training stability and visual quality. Furthermore, we propose a 3D Gaussian Primitives Reallocation strategy to address the imbalanced distribution of primitives introduced by naive point cloud densification. Extensive experiments on the SeaThru-NeRF and Submerged3D datasets demonstrate that SWAGSplatting consistently outperforms state-of-the-art methods across PSNR, SSIM, and LPIPS metrics, achieving up to a 3.48 dB improvement in PSNR, enabling more accurate and semantically coherent underwater scene reconstruction for applications in marine perception and exploration.
