Table of Contents
Fetching ...

A Leaf-Level Dataset for Soybean-Cotton Detection and Segmentation

Thiago H. Segreto, Juliano Negri, Paulo H. Polegato, João Manoel Herrera Pinheiro, Ricardo Godoy, Marcelo Becker

TL;DR

To enable robust leaf-level detection and segmentation under field conditions, the authors assembled a leaf-level dataset of soybean and cotton leaves annotated with instance masks and bounding boxes from 640 high-resolution images captured under varying growth stages, weed pressures, and lighting. Ground truth was produced via CVAT aided by SAM and stored in COCO format; performance was validated with a YOLOv11 medium model using five-fold cross-validation and data-ablation analyses. The dataset comprises 7,221 soybean leaves and 5,190 cotton leaves (12,411 leaves total) and is publicly available under CC_BY_4.0 to support targeted herbicide spraying, pest monitoring, and morphological trait analysis. Across evaluation, the framework achieves strong metrics such as $mAP_{50}$ and $mAP_{50-95}$, with cotton leaves generally easier to detect and segment than soybean; ablation indicates segmentation benefits persist with larger annotated sets. This resource thus provides a field-ready benchmark to improve data-driven soybean–cotton management strategies.

Abstract

Soybean and cotton are major drivers of many countries' agricultural sectors, offering substantial economic returns but also facing persistent challenges from volunteer plants and weeds that hamper sustainable management. Effectively controlling volunteer plants and weeds demands advanced recognition strategies that can identify these amidst complex crop canopies. While deep learning methods have demonstrated promising results for leaf-level detection and segmentation, existing datasets often fail to capture the complexity of real-world agricultural fields. To address this, we collected 640 high-resolution images from a commercial farm spanning multiple growth stages, weed pressures, and lighting variations. Each image is annotated at the leaf-instance level, with 7,221 soybean and 5,190 cotton leaves labeled via bounding boxes and segmentation masks, capturing overlapping foliage, small leaf size, and morphological similarities. We validate this dataset using YOLOv11, demonstrating state-of-the-art performance in accurately identifying and segmenting overlapping foliage. Our publicly available dataset supports advanced applications such as selective herbicide spraying and pest monitoring and can foster more robust, data-driven strategies for soybean-cotton management.

A Leaf-Level Dataset for Soybean-Cotton Detection and Segmentation

TL;DR

To enable robust leaf-level detection and segmentation under field conditions, the authors assembled a leaf-level dataset of soybean and cotton leaves annotated with instance masks and bounding boxes from 640 high-resolution images captured under varying growth stages, weed pressures, and lighting. Ground truth was produced via CVAT aided by SAM and stored in COCO format; performance was validated with a YOLOv11 medium model using five-fold cross-validation and data-ablation analyses. The dataset comprises 7,221 soybean leaves and 5,190 cotton leaves (12,411 leaves total) and is publicly available under CC_BY_4.0 to support targeted herbicide spraying, pest monitoring, and morphological trait analysis. Across evaluation, the framework achieves strong metrics such as and , with cotton leaves generally easier to detect and segment than soybean; ablation indicates segmentation benefits persist with larger annotated sets. This resource thus provides a field-ready benchmark to improve data-driven soybean–cotton management strategies.

Abstract

Soybean and cotton are major drivers of many countries' agricultural sectors, offering substantial economic returns but also facing persistent challenges from volunteer plants and weeds that hamper sustainable management. Effectively controlling volunteer plants and weeds demands advanced recognition strategies that can identify these amidst complex crop canopies. While deep learning methods have demonstrated promising results for leaf-level detection and segmentation, existing datasets often fail to capture the complexity of real-world agricultural fields. To address this, we collected 640 high-resolution images from a commercial farm spanning multiple growth stages, weed pressures, and lighting variations. Each image is annotated at the leaf-instance level, with 7,221 soybean and 5,190 cotton leaves labeled via bounding boxes and segmentation masks, capturing overlapping foliage, small leaf size, and morphological similarities. We validate this dataset using YOLOv11, demonstrating state-of-the-art performance in accurately identifying and segmenting overlapping foliage. Our publicly available dataset supports advanced applications such as selective herbicide spraying and pest monitoring and can foster more robust, data-driven strategies for soybean-cotton management.

Paper Structure

This paper contains 11 sections, 5 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Ground-truth annotations include detection bounding boxes, shown in the first row, and segmentation masks, shown in the second row.
  • Figure 2: Growth stage variations in soybean and cotton fields. A 3$\times$3 grid of raw images illustrates early (a--c), middle (d--f), and dense (g--i) canopy stages. In the early stage (1--3 weeks), sparse foliage and minimal leaf overlap simplify segmentation but offer limited complexity. The middle stage (4--7 weeks) introduces denser coverage, partial occlusions, and moderate weed presence. The dense stage (8--10 weeks) exhibits substantial leaf overlap, shading, and varied leaf sizes, posing increased challenges for both detection and segmentation.
  • Figure 3: Illustration of the dataset creation and annotation workflow. Field images are first acquired under near-vertical perspectives and filtered to remove near-identical samples. Experts and a reviewer then generate initial segmentation masks and bounding boxes in CVAT, assisted by the SAM. Connected component analysis eliminates small “blob” artifacts in the masks, and any duplicate labels are merged using a 90% IoU filter. The final output includes precise ground-truth masks and bounding boxes for soybean and cotton leaves.
  • Figure 4: Detection comparison. The left image (GT) depicts ground-truth bounding boxes for soybean (yellow) and cotton (purple). The right image (Pred) shows the model’s bounding-box outputs with confidence scores (blue).
  • Figure 5: Segmentation comparison. The left image (GT) illustrates manual annotations, with soybean leaves in yellow and cotton leaves in purple, whereas the right image (Pred) presents the model-generated segmentation masks.
  • ...and 2 more figures