ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation

Xinyang Pu; Hecheng Jia; Linghao Zheng; Feng Wang; Feng Xu

ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation

Xinyang Pu, Hecheng Jia, Linghao Zheng, Feng Wang, Feng Xu

TL;DR

The paper tackles SAR landcover semantic segmentation by adapting Segment Anything Model (SAM) through a parameter-efficient fine-tuning approach. It freezing SAM’s encoder and integrates lightweight adapters, a classwise mask decoder, and a task-specific input module to infuse low-frequency SAR features, enabling accurate multi-class segmentation with reduced training resources. Empirical results on FUSAR-Map1.0 and FUSAR-Map2.0 show that ClassWise-SAM-Adapter delivers state-of-the-art performance while maintaining a small footprint in trainable parameters, validating the practicality of deploying visual foundation models in the SAR domain. The work demonstrates the potential of SAM-based approaches for remote sensing tasks and motivates further exploration of foundation-model adaptation for SAR interpretation.

Abstract

In the realm of artificial intelligence, the emergence of foundation models, backed by high computing capabilities and extensive data, has been revolutionary. Segment Anything Model (SAM), built on the Vision Transformer (ViT) model with millions of parameters and vast training dataset SA-1B, excels in various segmentation scenarios relying on its significance of semantic information and generalization ability. Such achievement of visual foundation model stimulates continuous researches on specific downstream tasks in computer vision. The ClassWise-SAM-Adapter (CWSAM) is designed to adapt the high-performing SAM for landcover classification on space-borne Synthetic Aperture Radar (SAR) images. The proposed CWSAM freezes most of SAM's parameters and incorporates lightweight adapters for parameter efficient fine-tuning, and a classwise mask decoder is designed to achieve semantic segmentation task. This adapt-tuning method allows for efficient landcover classification of SAR images, balancing the accuracy with computational demand. In addition, the task specific input module injects low frequency information of SAR images by MLP-based layers to improve the model performance. Compared to conventional state-of-the-art semantic segmentation algorithms by extensive experiments, CWSAM showcases enhanced performance with fewer computing resources, highlighting the potential of leveraging foundational models like SAM for specific downstream tasks in the SAR domain. The source code is available at: https://github.com/xypu98/CWSAM.

ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation

TL;DR

Abstract

Paper Structure (17 sections, 10 equations, 7 figures, 9 tables)

This paper contains 17 sections, 10 equations, 7 figures, 9 tables.

Introduction
Related works
Segment Anything
Parameter efficient fine-tuning of Segment Anything
Semantic Segmentation and SAR landcover classification
METHODOLOGY
Adapted Vision Transformer-based image encoder
Classwise mask decoder and loss function
Task specific input of low frequency SAR characteristics
EXPERIMENTAL RESULTS AND ANALYSIS
Experimental Dataset and Settings
Evaluation Protocol and Metrics
Implementation Details
Comparison with the State-of-the-Art
Ablation Study
...and 2 more sections

Figures (7)

Figure 1: The output masks of remote sensing images by Segment Anything
Figure 2: The architecture of the proposed Classwise SAM Adapter.
Figure 3: Structure of Transformer block and adapter.
Figure 4: Embedding pipeline of classwise mask decoder.
Figure 5: the structure of task specific input module.
...and 2 more figures

ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation

TL;DR

Abstract

ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)