Ecological mapping with geospatial foundation models
Craig Mahlasi, Gciniwe S. Baloyi, Zaheed Gaffoor, Levente Klein, Anne Jones, Etienne Vos, Michal Muszynski, Geoffrey Dawson, Campbell Watson
TL;DR
Geospatial foundation models are explored for ecological mapping, addressing domain gaps and labeling challenges. The authors fine-tune Prithvi-E0-2.0 and TerraMind and compare them to a ResNet-101 baseline, evaluating zero-shot LULC generation and downstream tasks on NEON and Karukinka sites using multi-modal inputs and carefully derived labels. GFMs generally outperform the ResNet baseline, with TerraMind achieving the strongest results when additional modalities are incorporated, though performance hinges on input resolution and label fidelity. The work demonstrates the potential of geospatial foundation models for fine-grained ecological mapping while outlining practical limitations and directions for improving data quality, labels, and multimodal fusion in geospatial applications.
Abstract
Geospatial foundation models (GFMs) are a fast-emerging paradigm for various geospatial tasks, such as ecological mapping. However, the utility of GFMs has not been fully explored for high-value use cases. This study aims to explore the utility, challenges and opportunities associated with the application of GFMs for ecological uses. In this regard, we fine-tune several pretrained AI models, namely, Prithvi-E0-2.0 and TerraMind, across three use cases, and compare this with a baseline ResNet-101 model. Firstly, we demonstrate TerraMind's LULC generation capabilities. Lastly, we explore the utility of the GFMs in forest functional trait mapping and peatlands detection. In all experiments, the GFMs outperform the baseline ResNet models. In general TerraMind marginally outperforms Prithvi. However, with additional modalities TerraMind significantly outperforms the baseline ResNet and Prithvi models. Nonetheless, consideration should be given to the divergence of input data from pretrained modalities. We note that these models would benefit from higher resolution and more accurate labels, especially for use cases where pixel-level dynamics need to be mapped.
