The View From Space: Navigating Instrumentation Differences with EOFMs
Ryan P. Demilt, Nicholas LaHaye, Karis Tenneson
TL;DR
The paper investigates how Earth Observation Foundation Models (EOFMs) encode information from diverse sensor modalities, revealing that embedding spaces are highly sensitive to sensor architecture. By constructing a multi-modal paired dataset and analyzing frozen-prior-encoders Prithvi and DOFA through embedding visualization and local neighborhood measures, the authors demonstrate modality-driven clustering and substantial shifts in neighborhood structure across optical and SAR inputs. They show high modality-prediction accuracy from embeddings, indicating strong source-specific partitioning, and discuss implications for cross-modality generalization and benchmarking. The work underscores the need for multimodal pretraining and curated cross-sensor datasets to ensure robust remote-sensing representations and cautions against naive one-to-one band matching across heterogeneous sensors.
Abstract
Earth Observation Foundation Models (EOFMs) have exploded in prevalence as tools for processing the massive volumes of remotely sensed and other earth observation data, and for delivering impact on the many essential earth monitoring tasks. An emerging trend posits using the outputs of pre-trained models as 'embeddings' which summarize high dimensional data to be used for generic tasks such as similarity search and content-specific queries. However, most EOFM models are trained only on single modalities of data and then applied or benchmarked by matching bands across different modalities. It is not clear from existing work what impact diverse sensor architectures have on the internal representations of the present suite of EOFMs. We show in this work that the representation space of EOFMs is highly sensitive to sensor architecture and that understanding this difference gives a vital perspective on the pitfalls of current EOFM design and signals for how to move forward as model developers, users, and a community guided by robust remote-sensing science.
