WaterMono: Teacher-Guided Anomaly Masking and Enhancement Boosting for Robust Underwater Self-Supervised Monocular Depth Estimation
Yilin Ding, Kunqian Li, Han Mei, Shuaixin Liu, Guojia Hou
TL;DR
WaterMono tackles underwater monocular depth estimation under challenging conditions where dynamic regions, image degradation, and diverse camera angles hinder self-supervised learning. It introduces a two-stage teacher-student framework with a Teacher-Guided Anomaly Mask (TGAM), Image Enhancement Boosting (IEB) based on a simplified Underwater Image Formation Model, selective distillation, and rotated distillation to boost rotational robustness. The approach yields state-of-the-art depth accuracy on the FLSea benchmark while also delivering visually enhanced images that maintain inter-frame consistency, and it demonstrates strong generalization to new underwater datasets without fine-tuning. Overall, WaterMono reveals a mutually beneficial coupling between depth estimation and underwater image enhancement, enabling more reliable vision-based navigation for AUVs/ROVs.
Abstract
Depth information serves as a crucial prerequisite for various visual tasks, whether on land or underwater. Recently, self-supervised methods have achieved remarkable performance on several terrestrial benchmarks despite the absence of depth annotations. However, in more challenging underwater scenarios, they encounter numerous brand-new obstacles such as the influence of marine life and degradation of underwater images, which break the assumption of a static scene and bring low-quality images, respectively. Besides, the camera angles of underwater images are more diverse. Fortunately, we have discovered that knowledge distillation presents a promising approach for tackling these challenges. In this paper, we propose WaterMono, a novel framework for depth estimation coupled with image enhancement. It incorporates the following key measures: (1) We present a Teacher-Guided Anomaly Mask to identify dynamic regions within the images; (2) We employ depth information combined with the Underwater Image Formation Model to generate enhanced images, which in turn contribute to the depth estimation task; and (3) We utilize a rotated distillation strategy to enhance the model's rotational robustness. Comprehensive experiments demonstrate the effectiveness of our proposed method for both depth estimation and image enhancement. The source code and pre-trained models are available on the project home page: https://github.com/OUCVisionGroup/WaterMono.
