Table of Contents
Fetching ...

Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts

Xi Chen, Shen Yan, Juelin Zhu, Chen Chen, Yu Liu, Maojun Zhang

TL;DR

The paper tackles spectral-shift–induced generalization challenges in multispectral land cover classification by fine-tuning Vision Foundation Models with Land-MoE, a parameter-efficient adapter framework. It introduces two modules: Mixture of Low-rank Token Experts (MoLTE) for instance-aware feature adjustments via rank-differentiated tokens, and Frequency-aware Filters (FAF) for frequency-domain modulation that preserves semantically relevant information while suppressing noise. The optimization combines a Mask2Former semantic loss with an MoLTE-expert-balancing term, and features layer-wise application of MoLTE and FAF to refine representations across the network. Extensive cross-sensor and cross-geospatial experiments on five-billion-pixels-scale data demonstrate state-of-the-art generalization in MLCC, with additional strong RGB remote sensing generalization, underscoring Land-MoE’s practical value for scalable, robust land-cover mapping.

Abstract

We introduce Land-MoE, a novel approach for multispectral land cover classification (MLCC). Spectral shift, which emerges from disparities in sensors and geospatial conditions, poses a significant challenge in this domain. Existing methods predominantly rely on domain adaptation and generalization strategies, often utilizing small-scale models that exhibit limited performance. In contrast, Land-MoE addresses these issues by hierarchically inserting a Frequency-aware Mixture of Low-rank Token Experts, to fine-tune Vision Foundation Models (VFMs) in a parameter-efficient manner. Specifically, Land-MoE comprises two key modules: the mixture of low-rank token experts (MoLTE) and frequency-aware filters (FAF). MoLTE leverages rank-differentiated tokens to generate diverse feature adjustments for individual instances within multispectral images. By dynamically combining learnable low-rank token experts of varying ranks, it enhances the robustness against spectral shifts. Meanwhile, FAF conducts frequency-domain modulation on the refined features. This process enables the model to effectively capture frequency band information that is strongly correlated with semantic essence, while simultaneously suppressing frequency noise irrelevant to the task. Comprehensive experiments on MLCC tasks involving cross-sensor and cross-geospatial setups demonstrate that Land-MoE outperforms existing methods by a large margin. Additionally, the proposed approach has also achieved state-of-the-art performance in domain generalization semantic segmentation tasks of RGB remote sensing images.

Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts

TL;DR

The paper tackles spectral-shift–induced generalization challenges in multispectral land cover classification by fine-tuning Vision Foundation Models with Land-MoE, a parameter-efficient adapter framework. It introduces two modules: Mixture of Low-rank Token Experts (MoLTE) for instance-aware feature adjustments via rank-differentiated tokens, and Frequency-aware Filters (FAF) for frequency-domain modulation that preserves semantically relevant information while suppressing noise. The optimization combines a Mask2Former semantic loss with an MoLTE-expert-balancing term, and features layer-wise application of MoLTE and FAF to refine representations across the network. Extensive cross-sensor and cross-geospatial experiments on five-billion-pixels-scale data demonstrate state-of-the-art generalization in MLCC, with additional strong RGB remote sensing generalization, underscoring Land-MoE’s practical value for scalable, robust land-cover mapping.

Abstract

We introduce Land-MoE, a novel approach for multispectral land cover classification (MLCC). Spectral shift, which emerges from disparities in sensors and geospatial conditions, poses a significant challenge in this domain. Existing methods predominantly rely on domain adaptation and generalization strategies, often utilizing small-scale models that exhibit limited performance. In contrast, Land-MoE addresses these issues by hierarchically inserting a Frequency-aware Mixture of Low-rank Token Experts, to fine-tune Vision Foundation Models (VFMs) in a parameter-efficient manner. Specifically, Land-MoE comprises two key modules: the mixture of low-rank token experts (MoLTE) and frequency-aware filters (FAF). MoLTE leverages rank-differentiated tokens to generate diverse feature adjustments for individual instances within multispectral images. By dynamically combining learnable low-rank token experts of varying ranks, it enhances the robustness against spectral shifts. Meanwhile, FAF conducts frequency-domain modulation on the refined features. This process enables the model to effectively capture frequency band information that is strongly correlated with semantic essence, while simultaneously suppressing frequency noise irrelevant to the task. Comprehensive experiments on MLCC tasks involving cross-sensor and cross-geospatial setups demonstrate that Land-MoE outperforms existing methods by a large margin. Additionally, the proposed approach has also achieved state-of-the-art performance in domain generalization semantic segmentation tasks of RGB remote sensing images.

Paper Structure

This paper contains 32 sections, 11 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Spectral shift in multispectral imagery. Variations in sensor characteristics and geospatial conditions can lead to significant divergence in the spectral signatures of land cover features belonging to the same class.
  • Figure 2: Overview of Land-MoE. 1. Land-MoE hierarchically inserts well-designed adapters into VFM backbone networks in a parameter-efficient manner to enhance their generalization for the cross-domain MLCC. 2. Land-MoE has two key modules, the Mixture of Low-rank Token Experts (MoLTE) and the Frequency-Aware Filters (FAF). 3. MoLTE enhances the adaptability of feature adjustments to spectral shifts by leveraging low-rank learnable token experts with varying ranks. 4. FAF performs frequency-domain modulation on the refined features output by the MoLTE module, perceiving frequency-domain features inherently correlated with semantic essence.
  • Figure 3: Qualitative results for cross-sensor MLCC task. Comparative visualization of land cover classification from the IID-based method DSTC liu2024dual, frozen DINOv2 + Mask2Former decoder, VFM-based DG semantic segmentation methods (SET yi2024learning, Rein wei2024stronger, FADA bi2024learning), and our proposed Land-MoE. Input MSIs and corresponding ground truth maps are also shown for reference. Land-MoE exhibits superior accuracy in challenging cross-sensor scenarios. Please zoom in to the white box region to see more details.
  • Figure 4: Geographical distribution of SD and TDs for the constructed cross-sensor and cross-geospatial generalization tasks. Subfigure (a) presents the domain distribution for the cross-sensor task, where locations corresponding to the SD (GF-2 imagery) are marked by blue solid circles, and those corresponding to the TDs (PlanetScope, GF-1, and Sentinel-2 imagery) are indicated by red circles. Subfigure (b) illustrates the domain distribution for the cross-geospatial task, with blue solid circles representing the SD (GF-2 imagery from various regions) and red solid circles denoting the TD (GF-2 imagery from designated cities).
  • Figure 5: Qualitative results showing predicted land cover classification maps for the cross-sensor generalization task. The figure illustrates the performance of Land-MoE in comparison to state-of-the-art baseline methods on cross-scene multispectral remote sensing images.
  • ...and 3 more figures