Table of Contents
Fetching ...

On the Generalizability of Foundation Models for Crop Type Mapping

Yi-Chia Chang, Adam J. Stewart, Favyen Bastani, Piper Wolters, Shreya Kannan, George R. Huber, Jingtong Wang, Arindam Banerjee

TL;DR

This paper tackles the challenge of cross-regional generalization for crop type mapping by evaluating EO foundation models on a harmonized global dataset spanning five continents. It compares three pre-trained weights—SSL4EO-S12, SatlasPretrain, and ImageNet—under in-domain and out-of-distribution transfer scenarios using a ResNet-50 + U-Net architecture and a 90/10 train-test split. Key findings show that SSL4EO-S12, pre-trained on all Sentinel-2 bands, delivers the best performance, while out-of-distribution pre-training helps when ID data are scarce but can hurt when combined with substantial ID data due to distribution shifts; overall, 100 ID samples can yield high overall accuracy, but ~900 are needed for high average accuracy to address class imbalance. The work highlights the need for larger, region-balanced datasets, especially in Africa and South America, and suggests imbalance-aware training to improve performance on rare classes for global crop-type mapping.

Abstract

Foundation models pre-trained using self-supervised learning have shown powerful transfer learning capabilities on various downstream tasks, including language understanding, text generation, and image recognition. The Earth observation (EO) field has produced several foundation models pre-trained directly on multispectral satellite imagery for applications like precision agriculture, wildfire and drought monitoring, and natural disaster response. However, few studies have investigated the ability of these models to generalize to new geographic locations, and potential concerns of geospatial bias -- models trained on data-rich developed nations not transferring well to data-scarce developing nations -- remain. We evaluate three popular EO foundation models, SSL4EO-S12, SatlasPretrain, and ImageNet, on five crop classification datasets across five continents. Results show that pre-trained weights designed explicitly for Sentinel-2, such as SSL4EO-S12, outperform general pre-trained weights like ImageNet. While only 100 labeled images are sufficient for achieving high overall accuracy, 900 images are required to mitigate class imbalance and improve average accuracy.

On the Generalizability of Foundation Models for Crop Type Mapping

TL;DR

This paper tackles the challenge of cross-regional generalization for crop type mapping by evaluating EO foundation models on a harmonized global dataset spanning five continents. It compares three pre-trained weights—SSL4EO-S12, SatlasPretrain, and ImageNet—under in-domain and out-of-distribution transfer scenarios using a ResNet-50 + U-Net architecture and a 90/10 train-test split. Key findings show that SSL4EO-S12, pre-trained on all Sentinel-2 bands, delivers the best performance, while out-of-distribution pre-training helps when ID data are scarce but can hurt when combined with substantial ID data due to distribution shifts; overall, 100 ID samples can yield high overall accuracy, but ~900 are needed for high average accuracy to address class imbalance. The work highlights the need for larger, region-balanced datasets, especially in Africa and South America, and suggests imbalance-aware training to improve performance on rare classes for global crop-type mapping.

Abstract

Foundation models pre-trained using self-supervised learning have shown powerful transfer learning capabilities on various downstream tasks, including language understanding, text generation, and image recognition. The Earth observation (EO) field has produced several foundation models pre-trained directly on multispectral satellite imagery for applications like precision agriculture, wildfire and drought monitoring, and natural disaster response. However, few studies have investigated the ability of these models to generalize to new geographic locations, and potential concerns of geospatial bias -- models trained on data-rich developed nations not transferring well to data-scarce developing nations -- remain. We evaluate three popular EO foundation models, SSL4EO-S12, SatlasPretrain, and ImageNet, on five crop classification datasets across five continents. Results show that pre-trained weights designed explicitly for Sentinel-2, such as SSL4EO-S12, outperform general pre-trained weights like ImageNet. While only 100 labeled images are sufficient for achieving high overall accuracy, 900 images are required to mitigate class imbalance and improve average accuracy.
Paper Structure (15 sections, 2 figures, 3 tables)

This paper contains 15 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Reported metrics of ID, OOD + ID, and balanced OOD + ID using SSL4EO-S12 pre-trained weights. Average and overall metrics are given for F1-score, precision, recall, accuracy, and Jaccard Index (IoU). For all metrics, higher is better.
  • Figure 2: Visualization of example input Sentinel-2 images, ground truth masks, and model predictions using SSL4EO-S12 pre-trained weights. Overall results are promising, with the models capturing the general class distribution and correctly identifying most fields.