Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation
Hannah Kerner, Snehal Chaudhari, Aninda Ghosh, Caleb Robinson, Adeel Ahmad, Eddie Choi, Nathan Jacobs, Chris Holmes, Matthias Mohr, Rahul Dodhia, Juan M. Lavista Ferres, Jennifer Marcus
TL;DR
Fields of The World (FTW) introduces a globally diverse benchmark for agricultural field boundary segmentation by collecting 70,462 samples across 24 countries with Sentinel-2 imagery and fiboa-standardized polygon annotations. The dataset supports multi-temporal, multispectral inputs and provides 2- and 3-class target masks, enabling robust evaluation of semantic and instance segmentation approaches with country-level splits using a spatially aware 3×3 block design. Baseline experiments show that 3-class masks and using two temporal windows with RGB-NIR channels improve performance, and FTW pretraining yields strong zero-shot and transfer learning results, even in regions not included in training. The work demonstrates deployment potential (e.g., Ethiopia) and outlines a path for broader benchmarking, architecture exploration (including foundation models), and richer metadata to extend FTW in the future, with clear guidance for region-specific evaluation and benchmarking practices.
Abstract
Crop field boundaries are foundational datasets for agricultural monitoring and assessments but are expensive to collect manually. Machine learning (ML) methods for automatically extracting field boundaries from remotely sensed images could help realize the demand for these datasets at a global scale. However, current ML methods for field instance segmentation lack sufficient geographic coverage, accuracy, and generalization capabilities. Further, research on improving ML methods is restricted by the lack of labeled datasets representing the diversity of global agricultural fields. We present Fields of The World (FTW) -- a novel ML benchmark dataset for agricultural field instance segmentation spanning 24 countries on four continents (Europe, Africa, Asia, and South America). FTW is an order of magnitude larger than previous datasets with 70,462 samples, each containing instance and semantic segmentation masks paired with multi-date, multi-spectral Sentinel-2 satellite images. We provide results from baseline models for the new FTW benchmark, show that models trained on FTW have better zero-shot and fine-tuning performance in held-out countries than models that aren't pre-trained with diverse datasets, and show positive qualitative zero-shot results of FTW models in a real-world scenario -- running on Sentinel-2 scenes over Ethiopia.
