Table of Contents
Fetching ...

S$^3$AD: Semi-supervised Small Apple Detection in Orchard Environments

Robert Johanson, Christian Wilms, Ole Johannsen, Simone Frintrop

TL;DR

S^3AD tackles apple detection in orchards under data scarcity by reframing it as a semi-supervised problem and introducing the MAD dataset, which combines labeled apples (14,667 instances) with a large unlabeled image collection. The approach combines a context-driven TreeAttention module, selective tiling of high-attention regions, and a semi-supervised Faster R-CNN trained under the Soft Teacher framework to exploit unlabeled data. Empirical results on MAD and the MSU dataset show substantial improvements over strong fully supervised baselines, especially for small apples, with tiling and semi-supervised learning contributing to notable AP gains and acceptable runtime. The work also analyzes how apple properties such as relative size, occlusion, and lighting affect detection, confirming the particular difficulty of small, occluded, and extreme lighting conditions and demonstrating the generalization of the method across datasets. Overall, the proposed MAD dataset and S^3AD pipeline offer a practical, scalable path for semi-automatic orchard monitoring and yield-related applications.

Abstract

Crop detection is integral for precision agriculture applications such as automated yield estimation or fruit picking. However, crop detection, e.g., apple detection in orchard environments remains challenging due to a lack of large-scale datasets and the small relative size of the crops in the image. In this work, we address these challenges by reformulating the apple detection task in a semi-supervised manner. To this end, we provide the large, high-resolution dataset MAD comprising 105 labeled images with 14,667 annotated apple instances and 4,440 unlabeled images. Utilizing this dataset, we also propose a novel Semi-Supervised Small Apple Detection system S$^3$AD based on contextual attention and selective tiling to improve the challenging detection of small apples, while limiting the computational overhead. We conduct an extensive evaluation on MAD and the MSU dataset, showing that S$^3$AD substantially outperforms strong fully-supervised baselines, including several small object detection systems, by up to $14.9\%$. Additionally, we exploit the detailed annotations of our dataset w.r.t. apple properties to analyze the influence of relative size or level of occlusion on the results of various systems, quantifying current challenges.

S$^3$AD: Semi-supervised Small Apple Detection in Orchard Environments

TL;DR

S^3AD tackles apple detection in orchards under data scarcity by reframing it as a semi-supervised problem and introducing the MAD dataset, which combines labeled apples (14,667 instances) with a large unlabeled image collection. The approach combines a context-driven TreeAttention module, selective tiling of high-attention regions, and a semi-supervised Faster R-CNN trained under the Soft Teacher framework to exploit unlabeled data. Empirical results on MAD and the MSU dataset show substantial improvements over strong fully supervised baselines, especially for small apples, with tiling and semi-supervised learning contributing to notable AP gains and acceptable runtime. The work also analyzes how apple properties such as relative size, occlusion, and lighting affect detection, confirming the particular difficulty of small, occluded, and extreme lighting conditions and demonstrating the generalization of the method across datasets. Overall, the proposed MAD dataset and S^3AD pipeline offer a practical, scalable path for semi-automatic orchard monitoring and yield-related applications.

Abstract

Crop detection is integral for precision agriculture applications such as automated yield estimation or fruit picking. However, crop detection, e.g., apple detection in orchard environments remains challenging due to a lack of large-scale datasets and the small relative size of the crops in the image. In this work, we address these challenges by reformulating the apple detection task in a semi-supervised manner. To this end, we provide the large, high-resolution dataset MAD comprising 105 labeled images with 14,667 annotated apple instances and 4,440 unlabeled images. Utilizing this dataset, we also propose a novel Semi-Supervised Small Apple Detection system SAD based on contextual attention and selective tiling to improve the challenging detection of small apples, while limiting the computational overhead. We conduct an extensive evaluation on MAD and the MSU dataset, showing that SAD substantially outperforms strong fully-supervised baselines, including several small object detection systems, by up to . Additionally, we exploit the detailed annotations of our dataset w.r.t. apple properties to analyze the influence of relative size or level of occlusion on the results of various systems, quantifying current challenges.
Paper Structure (22 sections, 8 figures, 4 tables)

This paper contains 22 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Apple detection results using Faster R-CNN+FPN ren2015fasterlin2017feature and our proposed, semi-supervised small apple detection system S$^3$AD with selective tiling on a test image of our MAD dataset. Red arrows denote missed apples, while green arrows denote the detection of previously missed apples by the other system.
  • Figure 2: An example image (left) and the corresponding ground truth (right) from the test split of our dataset MAD.
  • Figure 3: System figure of our proposed semi-supervised small apple detection approach S$^3$AD. First, an attention map is generated with our TreeAttention module. The attention map is used by our selective tiling module to crop the most promising image regions into a set of overlapping tiles, which are subsequently processed by a semi-supervised Faster R-CNN object detector. During filtering and reconstruction, the per-tile results are merged.
  • Figure 4: Example of the filled alpha shape (a) that is used to train TreeAttention and the corresponding bounding boxes (b).
  • Figure 5: Apple detection results of S$^3$AD with and without tiling, Faster R-CNN+FPN, PANet, and Deformable DETR in terms of AR for property-specific ranges. Each point on the curves represents a bin of $2\%$ of the annotated apples in the test split of our dataset MAD.
  • ...and 3 more figures