CWD30: A Comprehensive and Holistic Dataset for Crop Weed Recognition in Precision Agriculture
Talha Ilyas, Dewa Made Sri Arsa, Khubaib Ahmad, Yong Chae Jeong, Okjae Won, Jong Hoon Lee, Hyongsuk Kim
TL;DR
CWD30 addresses a critical need in precision agriculture for robust crop-weed recognition by providing a large-scale, hierarchical, multi-view dataset covering 10 crops and 20 weeds across growth stages and environments. The dataset enables holistic modeling with full-plant images and multiple viewpoints, and includes a hold-out test set and three-fold splits to support rigorous evaluation. Baseline experiments show transformer architectures achieve top performance, and pretraining on CWD30 yields improvements in downstream tasks such as semantic segmentation while accelerating convergence. Overall, CWD30 serves as a practical benchmark to advance robust, generalizable CAPA systems and fosters collaboration across researchers and applications.
Abstract
The growing demand for precision agriculture necessitates efficient and accurate crop-weed recognition and classification systems. Current datasets often lack the sample size, diversity, and hierarchical structure needed to develop robust deep learning models for discriminating crops and weeds in agricultural fields. Moreover, the similar external structure and phenomics of crops and weeds complicate recognition tasks. To address these issues, we present the CWD30 dataset, a large-scale, diverse, holistic, and hierarchical dataset tailored for crop-weed recognition tasks in precision agriculture. CWD30 comprises over 219,770 high-resolution images of 20 weed species and 10 crop species, encompassing various growth stages, multiple viewing angles, and environmental conditions. The images were collected from diverse agricultural fields across different geographic locations and seasons, ensuring a representative dataset. The dataset's hierarchical taxonomy enables fine-grained classification and facilitates the development of more accurate, robust, and generalizable deep learning models. We conduct extensive baseline experiments to validate the efficacy of the CWD30 dataset. Our experiments reveal that the dataset poses significant challenges due to intra-class variations, inter-class similarities, and data imbalance. Additionally, we demonstrate that minor training modifications like using CWD30 pretrained backbones can significantly enhance model performance and reduce convergence time, saving training resources on several downstream tasks. These challenges provide valuable insights and opportunities for future research in crop-weed recognition. We believe that the CWD30 dataset will serve as a benchmark for evaluating crop-weed recognition algorithms, promoting advancements in precision agriculture, and fostering collaboration among researchers in the field.
