Table of Contents
Fetching ...

Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation

Xiaoxing Hu, Ziyang Gong, Yupei Wang, Yuru Jia, Fei Lin, Dexiang Gao, Ke An, Jianhong Han, Zhuoran Sun, Gen Luo, Gen Luo, Xue Yang

TL;DR

Earth-Adapter addresses artifact-induced degradation in RS feature representations when applying PEFT to RS semantic segmentation. It introduces a frequency-guided Mixture of Adapters (MoA) that splits features into low and high frequency components via Discrete Fourier Transform and a dynamic router to adaptively fuse the adapters, all while keeping the backbone VFMs frozen. Across SS, DA, and DG tasks, it achieves SOTA results on 12 RS benchmarks with notable DA gains and robust generalization improvements, demonstrating effective artifact suppression and feature denoising in RS imagery. The work offers practical benefits for deploying large Vision Foundation Models in RS applications with limited fine-tuning, and provides extensive analyses on adapter configurations, frequency cutoffs, and layer choices to guide future RS PEFT design.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) is a technique that allows us to adapt powerful Foundation Models (FMs) to diverse downstream tasks while preserving and unleashing their inherent capabilities. However, we have observed that existing PEFT methods, which are often designed with natural imagery in mind, struggle when applied to Remote Sensing (RS) scenarios. This is primarily due to their inability to handle artifact influences, a problem particularly severe in RS image features. To tackle this challenge, we introduce Earth-Adapter, the first PEFT method specifically designed for RS artifacts conquering. Earth-Adapter introduces a novel Mixture of Frequency Adaptation process that combines a Mixture of Adapter (MoA) with Discrete Fourier Transformation (DFT). By utilizing DFT, Earth-Adapter can decompose features into different frequency components, precisely separating artifacts from original features. The MoA then dynamically assigns weights to each adapter expert, allowing for the combination of features across various frequency domains. These simple-yet-effective approaches enable Earth-Adapter to more efficiently overcome the disturbances caused by artifacts than previous PEFT methods, significantly enhancing the FMs' performance on RS scenarios. Experiments on Domain Adaptation (DA), and Domain Generalization (DG) semantic segmentation benchmarks showcase the Earth-Adapter's effectiveness. Compared with baseline Rein, Earth-Adapter significantly improves 9.0% mIoU in DA and 3.1% mIoU in DG benchmarks. Our code will be released at https://github.com/VisionXLab/Earth-Adapter.

Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation

TL;DR

Earth-Adapter addresses artifact-induced degradation in RS feature representations when applying PEFT to RS semantic segmentation. It introduces a frequency-guided Mixture of Adapters (MoA) that splits features into low and high frequency components via Discrete Fourier Transform and a dynamic router to adaptively fuse the adapters, all while keeping the backbone VFMs frozen. Across SS, DA, and DG tasks, it achieves SOTA results on 12 RS benchmarks with notable DA gains and robust generalization improvements, demonstrating effective artifact suppression and feature denoising in RS imagery. The work offers practical benefits for deploying large Vision Foundation Models in RS applications with limited fine-tuning, and provides extensive analyses on adapter configurations, frequency cutoffs, and layer choices to guide future RS PEFT design.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) is a technique that allows us to adapt powerful Foundation Models (FMs) to diverse downstream tasks while preserving and unleashing their inherent capabilities. However, we have observed that existing PEFT methods, which are often designed with natural imagery in mind, struggle when applied to Remote Sensing (RS) scenarios. This is primarily due to their inability to handle artifact influences, a problem particularly severe in RS image features. To tackle this challenge, we introduce Earth-Adapter, the first PEFT method specifically designed for RS artifacts conquering. Earth-Adapter introduces a novel Mixture of Frequency Adaptation process that combines a Mixture of Adapter (MoA) with Discrete Fourier Transformation (DFT). By utilizing DFT, Earth-Adapter can decompose features into different frequency components, precisely separating artifacts from original features. The MoA then dynamically assigns weights to each adapter expert, allowing for the combination of features across various frequency domains. These simple-yet-effective approaches enable Earth-Adapter to more efficiently overcome the disturbances caused by artifacts than previous PEFT methods, significantly enhancing the FMs' performance on RS scenarios. Experiments on Domain Adaptation (DA), and Domain Generalization (DG) semantic segmentation benchmarks showcase the Earth-Adapter's effectiveness. Compared with baseline Rein, Earth-Adapter significantly improves 9.0% mIoU in DA and 3.1% mIoU in DG benchmarks. Our code will be released at https://github.com/VisionXLab/Earth-Adapter.

Paper Structure

This paper contains 22 sections, 17 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Performance across various remote sensing image segmentation benchmarks between Frozen VFM (DINOv2-L), Rein (Baseline) and the proposed Earth-Adapter.
  • Figure 2: Motivation and Structure Details of Earth-Adapter (a) points out the artifact problems in existing PEFT methods. (b) illustrates how Earth-Adapter divides and conquers the artifacts by frequency-guided strategy and MoA framework. ①, ②, and ③ show the sequence of each step in the DFT operation. (c) introduces the details of the Earth-Adapter component structures.
  • Figure 3: Visualization of Predicted Segmentation Maps We Compare Earth-Adapter with the Frozen DINOv2-L backbone and our baseline Rein on eight cross-domain benchmarks. For the Potsdam and Vaihingen color map, white is the Impervious surface,red is the clutter, blue is the building, Cyan is the low vegetation, green is the tree, and yellow is the car. For LoveDA color map, red is the building, yellow is the road, blue is the water, purple is the barren, green is the forest, brown is the agriculture.
  • Figure 4: Visualization and PCA of Adapters' Feature Maps. 'Agg. Feature' represents the aggregated adapters' features. 'PCA' represents the Principal Component Analysis of features. All visualizations represent feature maps, not heatmaps. Thus only the semantic boundaries within the features should be focused rather than color intensities.