Table of Contents
Fetching ...

EuroCropsML: A Time Series Benchmark Dataset For Few-Shot Crop Type Classification

Joana Reuss, Jan Macdonald, Simon Becker, Lorenz Richter, Marco Körner

TL;DR

EuroCropsML tackles the need for a transnational, few-shot capable benchmark for crop-type classification by introducing a time-series dataset built from 2021 Sentinel-2 L1C observations across three European ROIs and harmonized crop taxonomy. The authors present a two-stage data pipeline (data acquisition and pre-processing) to produce raw and ready-to-use ML data, including cloud-removal and per-parcel median band statistics across time steps. They define transfer-learning benchmarking scenarios with eight few-shot settings and demonstrate baseline experiments using a transformer-encoder, highlighting the value of region-specific pre-training for cross-region generalization. The dataset is openly available on Zenodo with an accompanying eurocropsml Python package that supports acquisition, processing, and benchmark configuration, enabling reproducible cross-region crop-type classification research with ready-made splits and configurable experiments.

Abstract

We introduce EuroCropsML, an analysis-ready remote sensing machine learning dataset for time series crop type classification of agricultural parcels in Europe. It is the first dataset designed to benchmark transnational few-shot crop type classification algorithms that supports advancements in algorithmic development and research comparability. It comprises 706 683 multi-class labeled data points across 176 classes, featuring annual time series of per-parcel median pixel values from Sentinel-2 L1C data for 2021, along with crop type labels and spatial coordinates. Based on the open-source EuroCrops collection, EuroCropsML is publicly available on Zenodo.

EuroCropsML: A Time Series Benchmark Dataset For Few-Shot Crop Type Classification

TL;DR

EuroCropsML tackles the need for a transnational, few-shot capable benchmark for crop-type classification by introducing a time-series dataset built from 2021 Sentinel-2 L1C observations across three European ROIs and harmonized crop taxonomy. The authors present a two-stage data pipeline (data acquisition and pre-processing) to produce raw and ready-to-use ML data, including cloud-removal and per-parcel median band statistics across time steps. They define transfer-learning benchmarking scenarios with eight few-shot settings and demonstrate baseline experiments using a transformer-encoder, highlighting the value of region-specific pre-training for cross-region generalization. The dataset is openly available on Zenodo with an accompanying eurocropsml Python package that supports acquisition, processing, and benchmark configuration, enabling reproducible cross-region crop-type classification research with ready-made splits and configurable experiments.

Abstract

We introduce EuroCropsML, an analysis-ready remote sensing machine learning dataset for time series crop type classification of agricultural parcels in Europe. It is the first dataset designed to benchmark transnational few-shot crop type classification algorithms that supports advancements in algorithmic development and research comparability. It comprises 706 683 multi-class labeled data points across 176 classes, featuring annual time series of per-parcel median pixel values from Sentinel-2 L1C data for 2021, along with crop type labels and spatial coordinates. Based on the open-source EuroCrops collection, EuroCropsML is publicly available on Zenodo.
Paper Structure (12 sections, 8 figures, 4 tables)

This paper contains 12 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Visualization of crop fields (using EuroCrops HCAT3 level 3.0 schneider_eurocrops_2021eurocrops_github) for Estonia (EE), Latvia (LV), and Portugal (PT). The majority of Latvia and Estonia is comprised of expansive clusters of arable crops and pasture meadow grassland grass. In contrast, Portugal presents a more diverse distribution among the various classes, with crop fields exhibiting a more dispersed pattern across the country.
  • Figure 2: (a) Number of parcels (with a log scale) of a certain size within Estonia, Latvia, Portugal, and the overall EuroCropsML dataset. The histogram bin width for the parcels sizes is 0.25km. (b) Number of parcels (with a log scale) of all 176.0 distinct EuroCropsML crop classes (HCAT3 level 6.0 schneider_eurocrops_2021schneider_eurocrops_2023) within Estonia, Latvia, Portugal, and the overall EuroCropsML dataset. The crop class with---by far---the largest prevalence is the meadow class.
  • Figure 3: Overview of the data acquisition and pre-processing pipeline for the EuroCropsML dataset. The names in the blue headers correspond to the location and module names of the respective step in the associated Python package, available at https://github.com/dida-do/eurocropsml (cf.\ref{['sec:level1_usage', 'sec:level2_usage']} in the Usage Notes).
  • Figure 4: Illustration of the data processing to obtain a median pixel time series from Sentinel-2 raster tiles for a single agricultural parcel. All Sentinel-2 tiles for the year 2021 that overlap with the parcel's geometry (red polygon) are collected. For each individual time step and Sentinel-2 band, they are clipped to the extent of the polygon before calculating the median pixel value for each time step and band individually, resulting in a multi-spectral time series.
  • Figure 5: Data records in Zenodo, referring to version 8 of the dataset.
  • ...and 3 more figures