From Seedling to Harvest: The GrowingSoy Dataset for Weed Detection in Soy Crops via Instance Segmentation
Raul Steinmetz, Victor A. Kich, Henrique Krever, Joao D. Rigo Mazzarolo, Ricardo B. Grando, Vinicius Marini, Celio Trois, Ard Nieuwenhuizen
TL;DR
This work tackles weed management in soy crops by introducing GrowingSoy, a temporally-rich dataset of 1,000 high-resolution images with pixel-accurate instance segmentation for soy and two weed types across the full growth cycle. It benchmarks six state-of-the-art models (YOLOv5 and YOLOv8 families) on the dataset, reporting strong segmentation performance (e.g., YOLOv8m achieving high mAP-50 across caruru, grassy weed, and soy) and demonstrating robust cross-stage generalization. Key contributions include the dataset itself, a detailed annotation pipeline, and a comprehensive model comparison that highlights the strengths of YOLOv8 architectures for multi-class plant segmentation. The dataset and findings have practical implications for automated crop management, enabling temporal tracking of weed invasion and improved decision-making in soybean production, with future potential for disease detection and yield prediction.
Abstract
Deep learning, particularly Convolutional Neural Networks (CNNs), has gained significant attention for its effectiveness in computer vision, especially in agricultural tasks. Recent advancements in instance segmentation have improved image classification accuracy. In this work, we introduce a comprehensive dataset for training neural networks to detect weeds and soy plants through instance segmentation. Our dataset covers various stages of soy growth, offering a chronological perspective on weed invasion's impact, with 1,000 meticulously annotated images. We also provide 6 state of the art models, trained in this dataset, that can understand and detect soy and weed in every stage of the plantation process. By using this dataset for weed and soy segmentation, we achieved a segmentation average precision of 79.1% and an average recall of 69.2% across all plant classes, with the YOLOv8X model. Moreover, the YOLOv8M model attained 78.7% mean average precision (mAp-50) in caruru weed segmentation, 69.7% in grassy weed segmentation, and 90.1% in soy plant segmentation.
