Empowering DINO Representations for Underwater Instance Segmentation via Aligner and Prompter
Zhiyang Chen, Chen Zhang, Hao Fang, Runmin Cong
TL;DR
DiveSeg tackles underwater instance segmentation by fine-tuning a powerful foundation model, DINOv2, through two lightweight adapters: AquaStyle Aligner, which captures and injects underwater color style via Fourier amplitude and cross-attention, and ObjectPrior Prompter, which provides object-level priors using binary masks to guide instance learning. The framework yields state-of-the-art performance on UIIS and USIS10K, with substantial gains in mAP, AP50, and AP75 while keeping a modest parameter footprint. Ablation confirms the complementary benefits of both modules, and qualitative results show sharper boundaries and better handling of cluttered underwater scenes. Overall, DiveSeg demonstrates the practicality of foundation-model-based UIS with targeted, efficient domain adaptation for marine exploration and ecological protection.
Abstract
Underwater instance segmentation (UIS), integrating pixel-level understanding and instance-level discrimination, is a pivotal technology in marine resource exploration and ecological protection. In recent years, large-scale pretrained visual foundation models, exemplified by DINO, have advanced rapidly and demonstrated remarkable performance on complex downstream tasks. In this paper, we demonstrate that DINO can serve as an effective feature learner for UIS, and we introduce DiveSeg, a novel framework built upon two insightful components: (1) The AquaStyle Aligner, designed to embed underwater color style features into the DINO fine-tuning process, facilitating better adaptation to the underwater domain. (2) The ObjectPrior Prompter, which incorporates binary segmentation-based prompts to deliver object-level priors, provides essential guidance for instance segmentation task that requires both object- and instance-level reasoning. We conduct thorough experiments on the popular UIIS and USIS10K datasets, and the results show that DiveSeg achieves the state-of-the-art performance. Code: https://github.com/ettof/Diveseg.
