Table of Contents
Fetching ...

Semi-Supervised Weed Detection in Vegetable Fields: In-domain and Cross-domain Experiments

Boyang Deng, Yuzhen Lu

TL;DR

This work tackles the data-label bottleneck in weed detection for vegetable fields by applying semi-supervised object detection (SSOD) and introducing a YOLOv8-based WeedTeacher. It presents a new two-domain weed dataset (19,931 images, including 11,496 unlabeled in the new domain) and a comprehensive evaluation against three existing SSOD methods and fully supervised baselines in both in-domain and cross-domain setups. In-domain results show WeedTeacher provides the best SSOD gains over the YOLOv8l baseline, with $mAP@50$ increasing by 2.6 points and $mAP@50:95$ by 3.1 points; however, cross-domain experiments reveal little to no benefit from unlabeled new-domain data, highlighting significant cross-domain adaptation challenges. The findings motivate future work on domain-adaptive SSL and extending SSOD frameworks to evolving detector architectures to enable robust, scalable weed detection in real-world agricultural settings.

Abstract

Robust weed detection remains a challenging task in precision weeding, requiring not only potent weed detection models but also large-scale, labeled data. However, the labeled data adequate for model training is practically difficult to come by due to the time-consuming, labor-intensive process that requires specialized expertise to recognize plant species. This study introduces semi-supervised object detection (SSOD) methods for leveraging unlabeled data for enhanced weed detection and proposes a new YOLOv8-based SSOD method, i.e., WeedTeacher. An experimental comparison of four SSOD methods, including three existing frameworks (i.e., DenseTeacher, EfficientTeacher, and SmallTeacher) and WeedTeacher, alongside fully supervised baselines, was conducted for weed detection in both in-domain and cross-domain contexts. A new, diverse weed dataset was created as the testbed, comprising a total of 19,931 field images from two differing domains, including 8,435 labeled (basic-domain) images acquired by handholding devices from 2021 to 2023 and 11,496 unlabeled (new-domain) images acquired by a ground-based mobile platform in 2024. The in-domain experiment with models trained using 10% of the labeled, basic-domain images and tested on the remaining 90% of the data, showed that the YOLOv8-basedWeedTeacher achieved the highest accuracy among all four SSOD methods, with an improvement of 2.6% mAP@50 and 3.1% mAP@50:95 over its supervised baseline (i.e., YOLOv8l). In the cross-domain experiment where the unlabeled new-domain data was incorporated, all four SSOD methods, however, resulted in no or limited improvements over their supervised counterparts. Research is needed to address the difficulty of cross-domain data utilization for robust weed detection.

Semi-Supervised Weed Detection in Vegetable Fields: In-domain and Cross-domain Experiments

TL;DR

This work tackles the data-label bottleneck in weed detection for vegetable fields by applying semi-supervised object detection (SSOD) and introducing a YOLOv8-based WeedTeacher. It presents a new two-domain weed dataset (19,931 images, including 11,496 unlabeled in the new domain) and a comprehensive evaluation against three existing SSOD methods and fully supervised baselines in both in-domain and cross-domain setups. In-domain results show WeedTeacher provides the best SSOD gains over the YOLOv8l baseline, with increasing by 2.6 points and by 3.1 points; however, cross-domain experiments reveal little to no benefit from unlabeled new-domain data, highlighting significant cross-domain adaptation challenges. The findings motivate future work on domain-adaptive SSL and extending SSOD frameworks to evolving detector architectures to enable robust, scalable weed detection in real-world agricultural settings.

Abstract

Robust weed detection remains a challenging task in precision weeding, requiring not only potent weed detection models but also large-scale, labeled data. However, the labeled data adequate for model training is practically difficult to come by due to the time-consuming, labor-intensive process that requires specialized expertise to recognize plant species. This study introduces semi-supervised object detection (SSOD) methods for leveraging unlabeled data for enhanced weed detection and proposes a new YOLOv8-based SSOD method, i.e., WeedTeacher. An experimental comparison of four SSOD methods, including three existing frameworks (i.e., DenseTeacher, EfficientTeacher, and SmallTeacher) and WeedTeacher, alongside fully supervised baselines, was conducted for weed detection in both in-domain and cross-domain contexts. A new, diverse weed dataset was created as the testbed, comprising a total of 19,931 field images from two differing domains, including 8,435 labeled (basic-domain) images acquired by handholding devices from 2021 to 2023 and 11,496 unlabeled (new-domain) images acquired by a ground-based mobile platform in 2024. The in-domain experiment with models trained using 10% of the labeled, basic-domain images and tested on the remaining 90% of the data, showed that the YOLOv8-basedWeedTeacher achieved the highest accuracy among all four SSOD methods, with an improvement of 2.6% mAP@50 and 3.1% mAP@50:95 over its supervised baseline (i.e., YOLOv8l). In the cross-domain experiment where the unlabeled new-domain data was incorporated, all four SSOD methods, however, resulted in no or limited improvements over their supervised counterparts. Research is needed to address the difficulty of cross-domain data utilization for robust weed detection.

Paper Structure

This paper contains 11 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The ground-based mobile platform for acquiring images in vegetable fields in 2024.
  • Figure 2: Example images from the labeled basic domain (left) and the unlabeled new domain (right).
  • Figure 3: YOLOv8-based WeedTeacher framework. EMA denotes exponential moving average. Teacher and student models share the same YOLOv8-large architecture.
  • Figure 4: The modeling pipeline of semi-supervised weed detection experiments. The yellow path represents modeling data within the basic domain, while the blue path indicates the experiment incorporating the new domain images as unlabeled data.