Table of Contents
Fetching ...

A region-wide, multi-year set of crop field boundary labels for Africa

L. D. Estes, A. Wussah, M. Asipunu, M. Gathigi, P. Kovačič, J. Muhando, B. V. Yeboah, F. K. Addai, E. S. Akakpo, M. K. Allotey, P. Amkoya, E. Amponsem, K. D. Donkoh, N. Ha, E. Heltzel, C. Juma, R. Mdawida, A. Miroyo, J. Mucha, J. Mugami, F. Mwawaza, D. A. Nyarko, P. Oduor, K. N. Ohemeng, S. I. D. Segbefia, T. Tumbula, F. Wambua, G. H. Xeflide, S. Ye, F. Yeboah

TL;DR

This study delivers a region-wide, multi-year dataset of crop field boundary labels for Africa, built from Planet NICFI imagery (2017–2023) and comprising 33,746 sites with 42,403 labels across three classes (non-field, field interior, field edge). A custom labeling platform with training, on-platform QC, expert reviews, and a Bayesian risk framework provides both high- and lower-quality labels and quantifies label uncertainty across multi-labeller sites. The vectorized labels and imagery are publicly available to support boundary-aware semantic segmentation model development and to illuminate regional cropland characteristics such as field size and density. The work demonstrates the feasibility and value of large-scale, multi-year labelling in Africa while acknowledging resolution-related limitations and offering guidance for future data collection and modeling efforts.

Abstract

African agriculture is undergoing rapid transformation. Annual maps of crop fields are key to understanding the nature of this transformation, but such maps are currently lacking and must be developed using advanced machine learning models trained on high resolution remote sensing imagery. To enable the development of such models, we delineated field boundaries in 33,746 Planet images captured between 2017 and 2023 across the continent using a custom labeling platform with built-in procedures for assessing and mitigating label error. We collected 42,403 labels, including 7,204 labels arising from tasks dedicated to assessing label quality (Class 1 labels), 32,167 from sites mapped once by a single labeller (Class 2) and 3,032 labels from sites where 3 or more labellers were tasked to map the same location (Class 4). Class 1 labels were used to calculate labeller-specific quality scores, while Class 1 and 4 sites mapped by at least 3 labellers were used to further evaluate label uncertainty using a Bayesian risk metric. Quality metrics showed that label quality was moderately high (0.75) for measures of total field extent, but low regarding the number of individual fields delineated (0.33), and the position of field edges (0.05). These values are expected when delineating small-scale fields in 3-5 m resolution imagery, which can be too coarse to reliably distinguish smaller fields, particularly in dense croplands, and therefore requires substantial labeller judgement. Nevertheless, previous work shows that such labels can train effective field mapping models. Furthermore, this large, probabilistic sample on its own provides valuable insight into regional agricultural characteristics, highlighting variations in the median field size and density. The imagery and vectorized labels along with quality information is available for download from two public repositories.

A region-wide, multi-year set of crop field boundary labels for Africa

TL;DR

This study delivers a region-wide, multi-year dataset of crop field boundary labels for Africa, built from Planet NICFI imagery (2017–2023) and comprising 33,746 sites with 42,403 labels across three classes (non-field, field interior, field edge). A custom labeling platform with training, on-platform QC, expert reviews, and a Bayesian risk framework provides both high- and lower-quality labels and quantifies label uncertainty across multi-labeller sites. The vectorized labels and imagery are publicly available to support boundary-aware semantic segmentation model development and to illuminate regional cropland characteristics such as field size and density. The work demonstrates the feasibility and value of large-scale, multi-year labelling in Africa while acknowledging resolution-related limitations and offering guidance for future data collection and modeling efforts.

Abstract

African agriculture is undergoing rapid transformation. Annual maps of crop fields are key to understanding the nature of this transformation, but such maps are currently lacking and must be developed using advanced machine learning models trained on high resolution remote sensing imagery. To enable the development of such models, we delineated field boundaries in 33,746 Planet images captured between 2017 and 2023 across the continent using a custom labeling platform with built-in procedures for assessing and mitigating label error. We collected 42,403 labels, including 7,204 labels arising from tasks dedicated to assessing label quality (Class 1 labels), 32,167 from sites mapped once by a single labeller (Class 2) and 3,032 labels from sites where 3 or more labellers were tasked to map the same location (Class 4). Class 1 labels were used to calculate labeller-specific quality scores, while Class 1 and 4 sites mapped by at least 3 labellers were used to further evaluate label uncertainty using a Bayesian risk metric. Quality metrics showed that label quality was moderately high (0.75) for measures of total field extent, but low regarding the number of individual fields delineated (0.33), and the position of field edges (0.05). These values are expected when delineating small-scale fields in 3-5 m resolution imagery, which can be too coarse to reliably distinguish smaller fields, particularly in dense croplands, and therefore requires substantial labeller judgement. Nevertheless, previous work shows that such labels can train effective field mapping models. Furthermore, this large, probabilistic sample on its own provides valuable insight into regional agricultural characteristics, highlighting variations in the median field size and density. The imagery and vectorized labels along with quality information is available for download from two public repositories.

Paper Structure

This paper contains 18 sections, 3 equations, 8 figures.

Figures (8)

  • Figure 1: A Planet image (top) with a corresponding image label with three classes distinguishing the non-field background (black), field interior (grey), and field edge (white).
  • Figure 2: A view of the labelling platform's interface, showing the target box and a true color rendering of the Planet imagery to label at that location. Labellers were shown a larger image extent in order to give additional context. The imagery was also rendered in false color rendering to aid interpretation, along with several virtual globe basemaps to provide higher resolution views.
  • Figure 3: The 4 dimensions used to assess label quality relative to Category 1 reference polygons. A = Correctly labelled areas inside the target cell; B = Fragmentation accuracy, or the agreement between the number of fields mapped by the reference (Class 1) label and the labeller; C = Edge accuracy; D = Categorical accuracy.
  • Figure 4: The locations of the 33,746 sites that were labelled, indicated by blue crosses.
  • Figure 5: Label risk, a measure of label uncertainty for sites mapped by three or more labellers. Panels A and C show the mean risk per pixel, mapped into 0.5° pixels and shown as a histogram, while panels B and D relate to the proportions of each site covered by risky pixels (r>0.34).
  • ...and 3 more figures