Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts
Xi Chen, Shen Yan, Juelin Zhu, Chen Chen, Yu Liu, Maojun Zhang
TL;DR
The paper tackles spectral-shift–induced generalization challenges in multispectral land cover classification by fine-tuning Vision Foundation Models with Land-MoE, a parameter-efficient adapter framework. It introduces two modules: Mixture of Low-rank Token Experts (MoLTE) for instance-aware feature adjustments via rank-differentiated tokens, and Frequency-aware Filters (FAF) for frequency-domain modulation that preserves semantically relevant information while suppressing noise. The optimization combines a Mask2Former semantic loss with an MoLTE-expert-balancing term, and features layer-wise application of MoLTE and FAF to refine representations across the network. Extensive cross-sensor and cross-geospatial experiments on five-billion-pixels-scale data demonstrate state-of-the-art generalization in MLCC, with additional strong RGB remote sensing generalization, underscoring Land-MoE’s practical value for scalable, robust land-cover mapping.
Abstract
We introduce Land-MoE, a novel approach for multispectral land cover classification (MLCC). Spectral shift, which emerges from disparities in sensors and geospatial conditions, poses a significant challenge in this domain. Existing methods predominantly rely on domain adaptation and generalization strategies, often utilizing small-scale models that exhibit limited performance. In contrast, Land-MoE addresses these issues by hierarchically inserting a Frequency-aware Mixture of Low-rank Token Experts, to fine-tune Vision Foundation Models (VFMs) in a parameter-efficient manner. Specifically, Land-MoE comprises two key modules: the mixture of low-rank token experts (MoLTE) and frequency-aware filters (FAF). MoLTE leverages rank-differentiated tokens to generate diverse feature adjustments for individual instances within multispectral images. By dynamically combining learnable low-rank token experts of varying ranks, it enhances the robustness against spectral shifts. Meanwhile, FAF conducts frequency-domain modulation on the refined features. This process enables the model to effectively capture frequency band information that is strongly correlated with semantic essence, while simultaneously suppressing frequency noise irrelevant to the task. Comprehensive experiments on MLCC tasks involving cross-sensor and cross-geospatial setups demonstrate that Land-MoE outperforms existing methods by a large margin. Additionally, the proposed approach has also achieved state-of-the-art performance in domain generalization semantic segmentation tasks of RGB remote sensing images.
