Table of Contents
Fetching ...

CWD30: A Comprehensive and Holistic Dataset for Crop Weed Recognition in Precision Agriculture

Talha Ilyas, Dewa Made Sri Arsa, Khubaib Ahmad, Yong Chae Jeong, Okjae Won, Jong Hoon Lee, Hyongsuk Kim

TL;DR

CWD30 addresses a critical need in precision agriculture for robust crop-weed recognition by providing a large-scale, hierarchical, multi-view dataset covering 10 crops and 20 weeds across growth stages and environments. The dataset enables holistic modeling with full-plant images and multiple viewpoints, and includes a hold-out test set and three-fold splits to support rigorous evaluation. Baseline experiments show transformer architectures achieve top performance, and pretraining on CWD30 yields improvements in downstream tasks such as semantic segmentation while accelerating convergence. Overall, CWD30 serves as a practical benchmark to advance robust, generalizable CAPA systems and fosters collaboration across researchers and applications.

Abstract

The growing demand for precision agriculture necessitates efficient and accurate crop-weed recognition and classification systems. Current datasets often lack the sample size, diversity, and hierarchical structure needed to develop robust deep learning models for discriminating crops and weeds in agricultural fields. Moreover, the similar external structure and phenomics of crops and weeds complicate recognition tasks. To address these issues, we present the CWD30 dataset, a large-scale, diverse, holistic, and hierarchical dataset tailored for crop-weed recognition tasks in precision agriculture. CWD30 comprises over 219,770 high-resolution images of 20 weed species and 10 crop species, encompassing various growth stages, multiple viewing angles, and environmental conditions. The images were collected from diverse agricultural fields across different geographic locations and seasons, ensuring a representative dataset. The dataset's hierarchical taxonomy enables fine-grained classification and facilitates the development of more accurate, robust, and generalizable deep learning models. We conduct extensive baseline experiments to validate the efficacy of the CWD30 dataset. Our experiments reveal that the dataset poses significant challenges due to intra-class variations, inter-class similarities, and data imbalance. Additionally, we demonstrate that minor training modifications like using CWD30 pretrained backbones can significantly enhance model performance and reduce convergence time, saving training resources on several downstream tasks. These challenges provide valuable insights and opportunities for future research in crop-weed recognition. We believe that the CWD30 dataset will serve as a benchmark for evaluating crop-weed recognition algorithms, promoting advancements in precision agriculture, and fostering collaboration among researchers in the field.

CWD30: A Comprehensive and Holistic Dataset for Crop Weed Recognition in Precision Agriculture

TL;DR

CWD30 addresses a critical need in precision agriculture for robust crop-weed recognition by providing a large-scale, hierarchical, multi-view dataset covering 10 crops and 20 weeds across growth stages and environments. The dataset enables holistic modeling with full-plant images and multiple viewpoints, and includes a hold-out test set and three-fold splits to support rigorous evaluation. Baseline experiments show transformer architectures achieve top performance, and pretraining on CWD30 yields improvements in downstream tasks such as semantic segmentation while accelerating convergence. Overall, CWD30 serves as a practical benchmark to advance robust, generalizable CAPA systems and fosters collaboration across researchers and applications.

Abstract

The growing demand for precision agriculture necessitates efficient and accurate crop-weed recognition and classification systems. Current datasets often lack the sample size, diversity, and hierarchical structure needed to develop robust deep learning models for discriminating crops and weeds in agricultural fields. Moreover, the similar external structure and phenomics of crops and weeds complicate recognition tasks. To address these issues, we present the CWD30 dataset, a large-scale, diverse, holistic, and hierarchical dataset tailored for crop-weed recognition tasks in precision agriculture. CWD30 comprises over 219,770 high-resolution images of 20 weed species and 10 crop species, encompassing various growth stages, multiple viewing angles, and environmental conditions. The images were collected from diverse agricultural fields across different geographic locations and seasons, ensuring a representative dataset. The dataset's hierarchical taxonomy enables fine-grained classification and facilitates the development of more accurate, robust, and generalizable deep learning models. We conduct extensive baseline experiments to validate the efficacy of the CWD30 dataset. Our experiments reveal that the dataset poses significant challenges due to intra-class variations, inter-class similarities, and data imbalance. Additionally, we demonstrate that minor training modifications like using CWD30 pretrained backbones can significantly enhance model performance and reduce convergence time, saving training resources on several downstream tasks. These challenges provide valuable insights and opportunities for future research in crop-weed recognition. We believe that the CWD30 dataset will serve as a benchmark for evaluating crop-weed recognition algorithms, promoting advancements in precision agriculture, and fostering collaboration among researchers in the field.
Paper Structure (19 sections, 14 figures, 6 tables)

This paper contains 19 sections, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Crop and Weed image samples from CWD30 dataset, captured at different life cycle stages, under varying environment and from different viewing angles.Key elements in the images are highlighted: pink-bordered images represent similarities at a macro class level (crop vs weed); orange boxes indicate the variability within a single weed species due to environmental factors such as indoor vs outdoor settings and soil type; images encased in red and brown borders demonstrate visually similar crop and weed classes; images marked with black dashed lines represent weeds cultivated in a laboratory setting; small inset boxes on each image provide information about the weather conditions and camera angle and plant age at time of capture.
  • Figure 2: A comparative plot of class distributions per viewing angle. Numbers in parenthesis represent the total number of images of that plant category.
  • Figure 3: Visual comparison of CWD30 dataset with other related datasets.
  • Figure 4: Taxonomy of CWD30 dataset. Showcasing the hierarchical organization of crop and weed species included in the dataset.
  • Figure 5: Comparative Analysis of Various Agricultural Datasets: Key Attributes and Characteristics.The symbol ' ' indicates an approximate value. HH, DM, and VM correspond to handheld, device mounted, and vehicle mounted cameras, respectively.
  • ...and 9 more figures