Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI
Nikolaos Dionelis, Casper Fibaek, Luke Camilleri, Andreas Luyts, Jente Bosmans, Bertrand Le Saux
TL;DR
The paper tackles evaluating and benchmarking Foundation Models for Earth Observation and geospatial AI to enable joint, high-accuracy, multi-task performance with limited labels. It formalizes a cost framework contrasting problem-specific models ($C_1 ≈ M y$) with Foundation Model–based approaches ($C_2 ≈ y (1 + N% M)$), highlighting label efficiency when $N%$ is around $10$–$20$ and $M$ is moderate. It then introduces an evaluation benchmark to standardize generalization measurements across EO FM systems and demonstrates, on tasks like land cover segmentation, that Foundation Models can outperform task-specific counterparts under both fine-tuning and linear probing. Overall, the work supports the deployment of Foundation Models for label-efficient, multi-task EO analysis and provides a practical standard for fair cross-model comparisons.
Abstract
When we are primarily interested in solving several problems jointly with a given prescribed high performance accuracy for each target application, then Foundation Models should for most cases be used rather than problem-specific models. We focus on the specific Computer Vision application of Foundation Models for Earth Observation (EO) and geospatial AI. These models can solve important problems we are tackling, including for example land cover classification, crop type mapping, flood segmentation, building density estimation, and road regression segmentation. In this paper, we show that for a limited number of labelled data, Foundation Models achieve improved performance compared to problem-specific models. In this work, we also present our proposed evaluation benchmark for Foundation Models for EO. Benchmarking the generalization performance of Foundation Models is important as it has become difficult to standardize a fair comparison across the many different models that have been proposed recently. We present the results using our evaluation benchmark for EO Foundation Models and show that Foundation Models are label efficient in the downstream tasks and help us solve problems we are tackling in EO and remote sensing.
