Table of Contents
Fetching ...

Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation

Shun Zhang, Xuechao Zou, Kai Li, Congyan Lang, Shiying Wang, Pin Tao, Tengfei Cao

TL;DR

This work tackles domain shift in fine-grained remote sensing image segmentation by bridging pretrained Vision Transformers and CNN backbones through an end-to-end KTDA framework. It introduces two modules: Feature Alignment Module (FAM) to transfer knowledge via channel and spatial alignment with a KL/MSE-based loss, and Feature Modulation Module (FMM) to adapt features to the target RS domain using transformer blocks and a dual-decoder setup. The approach achieves state-of-the-art performance on two tasks—fine-grained grass and cloud segmentation—e.g., mIoU improvements of about 2.57 and 8.0 points over strong baselines, and demonstrates the value of combining knowledge transfer with domain adaptation. A new fine-grained grass dataset is introduced to address boundary ambiguity and misclassification, highlighting practical impact for ecological monitoring and remote sensing analysis.

Abstract

Fine-grained remote sensing image segmentation is essential for accurately identifying detailed objects in remote sensing images. Recently, vision transformer models (VTMs) pre-trained on large-scale datasets have demonstrated strong zero-shot generalization. However, directly applying them to specific tasks may lead to domain shift. We introduce a novel end-to-end learning paradigm combining knowledge guidance with domain refinement to enhance performance. We present two key components: the Feature Alignment Module (FAM) and the Feature Modulation Module (FMM). FAM aligns features from a CNN-based backbone with those from the pretrained VTM's encoder using channel transformation and spatial interpolation, and transfers knowledge via KL divergence and L2 normalization constraint. FMM further adapts the knowledge to the specific domain to address domain shift. We also introduce a fine-grained grass segmentation dataset and demonstrate, through experiments on two datasets, that our method achieves a significant improvement of 2.57 mIoU on the grass dataset and 3.73 mIoU on the cloud dataset. The results highlight the potential of combining knowledge transfer and domain adaptation to overcome domain-related challenges and data limitations. The project page is available at https://xavierjiezou.github.io/KTDA/.

Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation

TL;DR

This work tackles domain shift in fine-grained remote sensing image segmentation by bridging pretrained Vision Transformers and CNN backbones through an end-to-end KTDA framework. It introduces two modules: Feature Alignment Module (FAM) to transfer knowledge via channel and spatial alignment with a KL/MSE-based loss, and Feature Modulation Module (FMM) to adapt features to the target RS domain using transformer blocks and a dual-decoder setup. The approach achieves state-of-the-art performance on two tasks—fine-grained grass and cloud segmentation—e.g., mIoU improvements of about 2.57 and 8.0 points over strong baselines, and demonstrates the value of combining knowledge transfer with domain adaptation. A new fine-grained grass dataset is introduced to address boundary ambiguity and misclassification, highlighting practical impact for ecological monitoring and remote sensing analysis.

Abstract

Fine-grained remote sensing image segmentation is essential for accurately identifying detailed objects in remote sensing images. Recently, vision transformer models (VTMs) pre-trained on large-scale datasets have demonstrated strong zero-shot generalization. However, directly applying them to specific tasks may lead to domain shift. We introduce a novel end-to-end learning paradigm combining knowledge guidance with domain refinement to enhance performance. We present two key components: the Feature Alignment Module (FAM) and the Feature Modulation Module (FMM). FAM aligns features from a CNN-based backbone with those from the pretrained VTM's encoder using channel transformation and spatial interpolation, and transfers knowledge via KL divergence and L2 normalization constraint. FMM further adapts the knowledge to the specific domain to address domain shift. We also introduce a fine-grained grass segmentation dataset and demonstrate, through experiments on two datasets, that our method achieves a significant improvement of 2.57 mIoU on the grass dataset and 3.73 mIoU on the cloud dataset. The results highlight the potential of combining knowledge transfer and domain adaptation to overcome domain-related challenges and data limitations. The project page is available at https://xavierjiezou.github.io/KTDA/.

Paper Structure

This paper contains 27 sections, 9 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of the proposed framework that integrates knowledge transfer and domain adaptation for fine-grained remote sensing image segmentation.
  • Figure 2: Detailed structure of the FAM and FMM.
  • Figure 3: Comparison of visualization segmentation results of different models on the fine-grained grass and cloud segmentation datasets.