A region-wide, multi-year set of crop field boundary labels for Africa
L. D. Estes, A. Wussah, M. Asipunu, M. Gathigi, P. Kovačič, J. Muhando, B. V. Yeboah, F. K. Addai, E. S. Akakpo, M. K. Allotey, P. Amkoya, E. Amponsem, K. D. Donkoh, N. Ha, E. Heltzel, C. Juma, R. Mdawida, A. Miroyo, J. Mucha, J. Mugami, F. Mwawaza, D. A. Nyarko, P. Oduor, K. N. Ohemeng, S. I. D. Segbefia, T. Tumbula, F. Wambua, G. H. Xeflide, S. Ye, F. Yeboah
TL;DR
This study delivers a region-wide, multi-year dataset of crop field boundary labels for Africa, built from Planet NICFI imagery (2017–2023) and comprising 33,746 sites with 42,403 labels across three classes (non-field, field interior, field edge). A custom labeling platform with training, on-platform QC, expert reviews, and a Bayesian risk framework provides both high- and lower-quality labels and quantifies label uncertainty across multi-labeller sites. The vectorized labels and imagery are publicly available to support boundary-aware semantic segmentation model development and to illuminate regional cropland characteristics such as field size and density. The work demonstrates the feasibility and value of large-scale, multi-year labelling in Africa while acknowledging resolution-related limitations and offering guidance for future data collection and modeling efforts.
Abstract
African agriculture is undergoing rapid transformation. Annual maps of crop fields are key to understanding the nature of this transformation, but such maps are currently lacking and must be developed using advanced machine learning models trained on high resolution remote sensing imagery. To enable the development of such models, we delineated field boundaries in 33,746 Planet images captured between 2017 and 2023 across the continent using a custom labeling platform with built-in procedures for assessing and mitigating label error. We collected 42,403 labels, including 7,204 labels arising from tasks dedicated to assessing label quality (Class 1 labels), 32,167 from sites mapped once by a single labeller (Class 2) and 3,032 labels from sites where 3 or more labellers were tasked to map the same location (Class 4). Class 1 labels were used to calculate labeller-specific quality scores, while Class 1 and 4 sites mapped by at least 3 labellers were used to further evaluate label uncertainty using a Bayesian risk metric. Quality metrics showed that label quality was moderately high (0.75) for measures of total field extent, but low regarding the number of individual fields delineated (0.33), and the position of field edges (0.05). These values are expected when delineating small-scale fields in 3-5 m resolution imagery, which can be too coarse to reliably distinguish smaller fields, particularly in dense croplands, and therefore requires substantial labeller judgement. Nevertheless, previous work shows that such labels can train effective field mapping models. Furthermore, this large, probabilistic sample on its own provides valuable insight into regional agricultural characteristics, highlighting variations in the median field size and density. The imagery and vectorized labels along with quality information is available for download from two public repositories.
