A Leaf-Level Dataset for Soybean-Cotton Detection and Segmentation
Thiago H. Segreto, Juliano Negri, Paulo H. Polegato, João Manoel Herrera Pinheiro, Ricardo Godoy, Marcelo Becker
TL;DR
To enable robust leaf-level detection and segmentation under field conditions, the authors assembled a leaf-level dataset of soybean and cotton leaves annotated with instance masks and bounding boxes from 640 high-resolution images captured under varying growth stages, weed pressures, and lighting. Ground truth was produced via CVAT aided by SAM and stored in COCO format; performance was validated with a YOLOv11 medium model using five-fold cross-validation and data-ablation analyses. The dataset comprises 7,221 soybean leaves and 5,190 cotton leaves (12,411 leaves total) and is publicly available under CC_BY_4.0 to support targeted herbicide spraying, pest monitoring, and morphological trait analysis. Across evaluation, the framework achieves strong metrics such as $mAP_{50}$ and $mAP_{50-95}$, with cotton leaves generally easier to detect and segment than soybean; ablation indicates segmentation benefits persist with larger annotated sets. This resource thus provides a field-ready benchmark to improve data-driven soybean–cotton management strategies.
Abstract
Soybean and cotton are major drivers of many countries' agricultural sectors, offering substantial economic returns but also facing persistent challenges from volunteer plants and weeds that hamper sustainable management. Effectively controlling volunteer plants and weeds demands advanced recognition strategies that can identify these amidst complex crop canopies. While deep learning methods have demonstrated promising results for leaf-level detection and segmentation, existing datasets often fail to capture the complexity of real-world agricultural fields. To address this, we collected 640 high-resolution images from a commercial farm spanning multiple growth stages, weed pressures, and lighting variations. Each image is annotated at the leaf-instance level, with 7,221 soybean and 5,190 cotton leaves labeled via bounding boxes and segmentation masks, capturing overlapping foliage, small leaf size, and morphological similarities. We validate this dataset using YOLOv11, demonstrating state-of-the-art performance in accurately identifying and segmenting overlapping foliage. Our publicly available dataset supports advanced applications such as selective herbicide spraying and pest monitoring and can foster more robust, data-driven strategies for soybean-cotton management.
