Table of Contents
Fetching ...

Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

Pingping Zhang, Tianyu Yan, Yang Liu, Huchuan Lu

TL;DR

This work addresses Marine Animal Segmentation (MAS) by adapting the Segment Anything Model (SAM) to underwater domains. It introduces a Dual-SAM Encoder to inject marine priors via gamma-corrected imagery and adapters, and pairs it with Multi-level Coupled Prompts, a Dilated Fusion Attention Module, and Criss-Cross Connectivity Prediction to capture structured connectivity beyond pixel-wise masks. Pseudo-label Mutual Supervision enables mutual refinement between dual decoders, yielding consistent, state-of-the-art MAS performance across five datasets. The approach demonstrates strong transferability and robustness to zero-shot scenarios, highlighting SAM's potential when domain-specific priors and decoding strategies are embedded for underwater perception.

Abstract

As an important pillar of underwater intelligence, Marine Animal Segmentation (MAS) involves segmenting animals within marine environments. Previous methods don't excel in extracting long-range contextual features and overlook the connectivity between discrete pixels. Recently, Segment Anything Model (SAM) offers a universal framework for general segmentation tasks. Unfortunately, trained with natural images, SAM does not obtain the prior knowledge from marine images. In addition, the single-position prompt of SAM is very insufficient for prior guidance. To address these issues, we propose a novel feature learning framework, named Dual-SAM for high-performance MAS. To this end, we first introduce a dual structure with SAM's paradigm to enhance feature learning of marine images. Then, we propose a Multi-level Coupled Prompt (MCP) strategy to instruct comprehensive underwater prior information, and enhance the multi-level features of SAM's encoder with adapters. Subsequently, we design a Dilated Fusion Attention Module (DFAM) to progressively integrate multi-level features from SAM's encoder. Finally, instead of directly predicting the masks of marine animals, we propose a Criss-Cross Connectivity Prediction (C$^3$P) paradigm to capture the inter-connectivity between discrete pixels. With dual decoders, it generates pseudo-labels and achieves mutual supervision for complementary feature representations, resulting in considerable improvements over previous techniques. Extensive experiments verify that our proposed method achieves state-of-the-art performances on five widely-used MAS datasets. The code is available at https://github.com/Drchip61/Dual_SAM.

Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM

TL;DR

This work addresses Marine Animal Segmentation (MAS) by adapting the Segment Anything Model (SAM) to underwater domains. It introduces a Dual-SAM Encoder to inject marine priors via gamma-corrected imagery and adapters, and pairs it with Multi-level Coupled Prompts, a Dilated Fusion Attention Module, and Criss-Cross Connectivity Prediction to capture structured connectivity beyond pixel-wise masks. Pseudo-label Mutual Supervision enables mutual refinement between dual decoders, yielding consistent, state-of-the-art MAS performance across five datasets. The approach demonstrates strong transferability and robustness to zero-shot scenarios, highlighting SAM's potential when domain-specific priors and decoding strategies are embedded for underwater perception.

Abstract

As an important pillar of underwater intelligence, Marine Animal Segmentation (MAS) involves segmenting animals within marine environments. Previous methods don't excel in extracting long-range contextual features and overlook the connectivity between discrete pixels. Recently, Segment Anything Model (SAM) offers a universal framework for general segmentation tasks. Unfortunately, trained with natural images, SAM does not obtain the prior knowledge from marine images. In addition, the single-position prompt of SAM is very insufficient for prior guidance. To address these issues, we propose a novel feature learning framework, named Dual-SAM for high-performance MAS. To this end, we first introduce a dual structure with SAM's paradigm to enhance feature learning of marine images. Then, we propose a Multi-level Coupled Prompt (MCP) strategy to instruct comprehensive underwater prior information, and enhance the multi-level features of SAM's encoder with adapters. Subsequently, we design a Dilated Fusion Attention Module (DFAM) to progressively integrate multi-level features from SAM's encoder. Finally, instead of directly predicting the masks of marine animals, we propose a Criss-Cross Connectivity Prediction (CP) paradigm to capture the inter-connectivity between discrete pixels. With dual decoders, it generates pseudo-labels and achieves mutual supervision for complementary feature representations, resulting in considerable improvements over previous techniques. Extensive experiments verify that our proposed method achieves state-of-the-art performances on five widely-used MAS datasets. The code is available at https://github.com/Drchip61/Dual_SAM.
Paper Structure (22 sections, 30 equations, 13 figures, 14 tables)

This paper contains 22 sections, 30 equations, 13 figures, 14 tables.

Figures (13)

  • Figure 1: Our inspirations and advantages. (a) Single-position prompt of SAM. (b) Our multi-level prompt. (c) Mutual supervision for our Dual-SAM's decoders. (d) Our Dual-SAM delivers high performances on multiple datasets.
  • Figure 2: The whole framework of our proposed approach. It contains five main components: Dual-SAM Encoder (DSE), Multi-level Coupled Prompt (MCP), Dilated Fusion Attention Module (DFAM), Criss-Cross Connectivity Prediction (C$^3$P) and Pseudo-label Mutual Supervision (PMS). Our framework can significantly improve the Marine Animal Segmentation (MAS) with SAM.
  • Figure 3: Our proposed Multi-level Coupled Prompt (MCP).
  • Figure 4: Our Dilated Fusion Attention Module (DFAM).
  • Figure 5: Our Criss-Cross Connectivity Prediction (C$^3$P).
  • ...and 8 more figures