Table of Contents
Fetching ...

HFP-SAM: Hierarchical Frequency Prompted SAM for Efficient Marine Animal Segmentation

Pingping Zhang, Tianyu Yan, Yuhao Wang, Yang Liu, Tongdan Tang, Yili Ma, Long Lv, Feng Tian, Weibing Sun, and Huchuan Lu

Abstract

Marine Animal Segmentation (MAS) aims at identifying and segmenting marine animals from complex marine environments. Most of previous deep learning-based MAS methods struggle with the long-distance modeling issue. Recently, Segment Anything Model (SAM) has gained popularity in general image segmentation. However, it lacks of perceiving fine-grained details and frequency information. To this end, we propose a novel learning framework, named Hierarchical Frequency Prompted SAM (HFP-SAM) for high-performance MAS. First, we design a Frequency Guided Adapter (FGA) to efficiently inject marine scene information into the frozen SAM backbone through frequency domain prior masks. Additionally, we introduce a Frequency-aware Point Selection (FPS) to generate highlighted regions through frequency analysis. These regions are combined with the coarse predictions of SAM to generate point prompts and integrate into SAM's decoder for fine predictions. Finally, to obtain comprehensive segmentation masks, we introduce a Full-View Mamba (FVM) to efficiently extract spatial and channel contextual information with linear computational complexity. Extensive experiments on four public datasets demonstrate the superior performance of our approach. The source code is publicly available at https://github.com/Drchip61/TIP-HFP-SAM.

HFP-SAM: Hierarchical Frequency Prompted SAM for Efficient Marine Animal Segmentation

Abstract

Marine Animal Segmentation (MAS) aims at identifying and segmenting marine animals from complex marine environments. Most of previous deep learning-based MAS methods struggle with the long-distance modeling issue. Recently, Segment Anything Model (SAM) has gained popularity in general image segmentation. However, it lacks of perceiving fine-grained details and frequency information. To this end, we propose a novel learning framework, named Hierarchical Frequency Prompted SAM (HFP-SAM) for high-performance MAS. First, we design a Frequency Guided Adapter (FGA) to efficiently inject marine scene information into the frozen SAM backbone through frequency domain prior masks. Additionally, we introduce a Frequency-aware Point Selection (FPS) to generate highlighted regions through frequency analysis. These regions are combined with the coarse predictions of SAM to generate point prompts and integrate into SAM's decoder for fine predictions. Finally, to obtain comprehensive segmentation masks, we introduce a Full-View Mamba (FVM) to efficiently extract spatial and channel contextual information with linear computational complexity. Extensive experiments on four public datasets demonstrate the superior performance of our approach. The source code is publicly available at https://github.com/Drchip61/TIP-HFP-SAM.
Paper Structure (21 sections, 17 equations, 16 figures, 17 tables)

This paper contains 21 sections, 17 equations, 16 figures, 17 tables.

Figures (16)

  • Figure 1: Our motivations and advantages. The first row shows the input image, ground truth, the prediction mask obtained by point prompt and box prompt, respectively. The second row shows the component maps obtained from the wavelet transform and the result of our combined frequency map. The third row displays the existing point prompt methods and our proposed method.
  • Figure 2: The framework of our proposed HFP-SAM. It consists of three main components: Frequency Guided Adapter (FGA), Frequency-aware Point Selection (FPS), and Full View Mamba (FVM). HFP-SAM leverages frequency domain information to automatically obtain efficient prompts. Additionally, HFP-SAM utilizes the FVM to fully combine frequency and spatial domain information.
  • Figure 3: Illustration of the Frequency Guided Adapter.
  • Figure 4: Illustration of the Full View Mamba.
  • Figure 5: Pairwise dataset distances measured by W1 and MMD-RBF.
  • ...and 11 more figures