FlexiMo: A Flexible Remote Sensing Foundation Model

Xuyang Li; Chenyu Li; Pedram Ghamisi; Danfeng Hong

FlexiMo: A Flexible Remote Sensing Foundation Model

Xuyang Li, Chenyu Li, Pedram Ghamisi, Danfeng Hong

TL;DR

FlexiMo addresses the fixed-resolution limitation of remote sensing foundation models by introducing a resolution-aware, parameter-free patch embedding alignment (PI-Resize) and a wavelength-guided channel adaptation mechanism. The approach dynamically adapts to arbitrary input resolutions and channel counts without changing the backbone, enabling efficient fine-tuning of ViT-based RSFMs across RGB, multispectral, and SAR data. Comprehensive experiments demonstrate state-of-the-art or competitive performance on image- and pixel-level tasks across diverse datasets, with extensive ablations validating robustness to varying image sizes, patch sizes, and channel configurations. This work significantly enhances the practical deployment of foundation models in real-world, multi-source remote sensing pipelines by improving spatial adaptability and spectral fidelity.

Abstract

The rapid expansion of multi-source satellite imagery drives innovation in Earth observation, opening unprecedented opportunities for Remote Sensing Foundation Models to harness diverse data. However, many existing models remain constrained by fixed spatial resolutions and patch sizes, limiting their ability to fully exploit the heterogeneous spatial characteristics inherent in satellite imagery. To address these challenges, we propose FlexiMo, a flexible remote sensing foundation model that endows the pre-trained model with the flexibility to adapt to arbitrary spatial resolutions. Central to FlexiMo is a spatial resolution-aware module that employs a parameter-free alignment embedding mechanism to dynamically recalibrate patch embeddings based on the input image's resolution and dimensions. This design not only preserves critical token characteristics and ensures multi-scale feature fidelity but also enables efficient feature extraction without requiring modifications to the underlying network architecture. In addition, FlexiMo incorporates a lightweight channel adaptation module that leverages prior spectral information from sensors. This mechanism allows the model to process images with varying numbers of channels while maintaining the data's intrinsic physical properties. Extensive experiments on diverse multimodal, multi-resolution, and multi-scale datasets demonstrate that FlexiMo significantly enhances model generalization and robustness. In particular, our method achieves outstanding performance across a range of downstream tasks, including scene classification, land cover classification, urban building segmentation, and cloud detection. By enabling parameter-efficient and physically consistent adaptation, FlexiMo paves the way for more adaptable and effective foundation models in real-world remote sensing applications.

FlexiMo: A Flexible Remote Sensing Foundation Model

TL;DR

Abstract

FlexiMo: A Flexible Remote Sensing Foundation Model

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)