Fine-tuning of Geospatial Foundation Models for Aboveground Biomass Estimation
Michal Muszynski, Levente Klein, Ademir Ferreira da Silva, Anjani Prasad Atluri, Carlos Gomes, Daniela Szwarcman, Gurkanwar Singh, Kewen Gu, Maciel Zortea, Naomi Simumba, Paolo Fraccaro, Shraddha Singh, Steve Meliksetian, Campbell Watson, Daiki Kimura, Harini Srinivasan
TL;DR
This work investigates fine-tuning geospatial foundation models (GFMs) to estimate above-ground biomass (AGB) from space-borne imagery in Brazil, using a frozen Swin-B encoder and a modified UPerNet decoder. Compared with a U-Net trained from scratch, the frozen-encoder GFMs achieve comparable RMSE while requiring roughly 13x fewer trainable parameters, demonstrating substantial training efficiency. The study also evaluates transfer learning across Brazilian eco-regions, showing robust performance for low AGB values and highlighting region-specific differences in moderate AGB ranges. These findings suggest GFMs provide a scalable, label-efficient approach for large-scale biomass mapping and carbon monitoring, with potential extensions to multimodal data and cloudy-sky scenarios.
Abstract
Global vegetation structure mapping is critical for understanding the global carbon cycle and maximizing the efficacy of nature-based carbon sequestration initiatives. Moreover, vegetation structure mapping can help reduce the impacts of climate change by, for example, guiding actions to improve water security, increase biodiversity and reduce flood risk. Global satellite measurements provide an important set of observations for monitoring and managing deforestation and degradation of existing forests, natural forest regeneration, reforestation, biodiversity restoration, and the implementation of sustainable agricultural practices. In this paper, we explore the effectiveness of fine-tuning of a geospatial foundation model to estimate above-ground biomass (AGB) using space-borne data collected across different eco-regions in Brazil. The fine-tuned model architecture consisted of a Swin-B transformer as the encoder (i.e., backbone) and a single convolutional layer for the decoder head. All results were compared to a U-Net which was trained as the baseline model Experimental results of this sparse-label prediction task demonstrate that the fine-tuned geospatial foundation model with a frozen encoder has comparable performance to a U-Net trained from scratch. This is despite the fine-tuned model having 13 times less parameters requiring optimization, which saves both time and compute resources. Further, we explore the transfer-learning capabilities of the geospatial foundation models by fine-tuning on satellite imagery with sparse labels from different eco-regions in Brazil.
