PhenoBench -- A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

Jan Weyler; Federico Magistri; Elias Marks; Yue Linn Chong; Matteo Sodano; Gianmarco Roggiolani; Nived Chebrolu; Cyrill Stachniss; Jens Behley

PhenoBench -- A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

Jan Weyler, Federico Magistri, Elias Marks, Yue Linn Chong, Matteo Sodano, Gianmarco Roggiolani, Nived Chebrolu, Cyrill Stachniss, Jens Behley

TL;DR

PhenoBench delivers a large UAV-based RGB dataset of real sugar beet fields with densely labeled plant and leaf instances and temporally aligned IDs across multiple dates, enabling robust evaluation of semantic, instance, panoptic, leaf, and hierarchical segmentation tasks. It provides a hidden-test benchmark with server-side evaluation and baseline results from both general and agricultural-domain models, illustrating the unique challenges of weed segmentation and leaf-plant coupling in field conditions. The study also showcases a community challenge at ICCV 2023, where foundation-model–driven approaches began to outperform baselines, and discusses the broader impact on self-supervised learning and cross-domain generalization in agricultural perception. Overall, PhenoBench aims to accelerate progress in field-aware vision for sustainable agriculture by supplying high-quality data, diverse tasks, and reproducible evaluation tools, with direct implications for breeding, precision farming, and automated phenotyping.

Abstract

The production of food, feed, fiber, and fuel is a key task of agriculture, which has to cope with many challenges in the upcoming decades, e.g., a higher demand, climate change, lack of workers, and the availability of arable land. Vision systems can support making better and more sustainable field management decisions, but also support the breeding of new crop varieties by allowing temporally dense and reproducible measurements. Recently, agricultural robotics got an increasing interest in the vision and robotics communities since it is a promising avenue for coping with the aforementioned lack of workers and enabling more sustainable production. While large datasets and benchmarks in other domains are readily available and enable significant progress, agricultural datasets and benchmarks are comparably rare. We present an annotated dataset and benchmarks for the semantic interpretation of real agricultural fields. Our dataset recorded with a UAV provides high-quality, pixel-wise annotations of crops and weeds, but also crop leaf instances at the same time. Furthermore, we provide benchmarks for various tasks on a hidden test set comprised of different fields: known fields covered by the training data and a completely unseen field. Our dataset, benchmarks, and code are available at \url{https://www.phenobench.org}.

PhenoBench -- A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

TL;DR

Abstract

Paper Structure (17 sections, 7 figures, 8 tables)

This paper contains 17 sections, 7 figures, 8 tables.

Introduction
Introduction
Related Work
Our Dataset
Data Collection
Labeling Process
Temporal Alignment
Dataset Statistics
Benchmarks
Semantic Segmentation
Panoptic Segmentation
Detection
Leaf Instance Segmentation
Hierarchical Panoptic Segmentation
Challenge in Conjunction with CVPPA Workshop at IEEE/CVF ICCV 2023
...and 2 more sections

Figures (7)

Figure 1: Our dataset, called PhenoBench, provides dense semantic plant-level instance annotations (shown by different colors) of sugar beet crops and weeds (green and red in the semantics) and leaf-level instance annotations of crops (different colors correspond to different instances) for high-resolution images recorded with a UAV. The dataset consists of images collected at different times during a growing season, which captures various growth stages of plants.
Figure 2: Variability in overlap and illumination of plants at the same part of the field on different recording dates. Theses examples show the variation in growth stages ranging from 4 leaf stage (early growth stage) to plants with over 20 leaves (later growth stage) and the variety of illuminations with sunny (left) and overcast (right) weather conditions.
Figure 3: Orthophoto of the field recorded in 2020 and our spatial separation into rows for training (green), validation (blue), and testing (red). Due to the geo-referencing of the images, we extracted the same rows on each of the dates.
Figure 4: Varying conditions of the field recorded at different locations, which are treated with different amounts of herbicides. From left to right: Fully-herbicided, partially-herbicided, and non-herbicided field conditions recorded at the same day.
Figure 5: Extracted tiles per iteration such that a row is densely covered with tiles to ensure that all plants are completely visible in at least one tile. Annotations of tiles are transferred between iterations and aggregated in the global image $I_g$.
...and 2 more figures

PhenoBench -- A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

TL;DR

Abstract

PhenoBench -- A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

Authors

TL;DR

Abstract

Table of Contents

Figures (7)