S$^3$AD: Semi-supervised Small Apple Detection in Orchard Environments
Robert Johanson, Christian Wilms, Ole Johannsen, Simone Frintrop
TL;DR
S^3AD tackles apple detection in orchards under data scarcity by reframing it as a semi-supervised problem and introducing the MAD dataset, which combines labeled apples (14,667 instances) with a large unlabeled image collection. The approach combines a context-driven TreeAttention module, selective tiling of high-attention regions, and a semi-supervised Faster R-CNN trained under the Soft Teacher framework to exploit unlabeled data. Empirical results on MAD and the MSU dataset show substantial improvements over strong fully supervised baselines, especially for small apples, with tiling and semi-supervised learning contributing to notable AP gains and acceptable runtime. The work also analyzes how apple properties such as relative size, occlusion, and lighting affect detection, confirming the particular difficulty of small, occluded, and extreme lighting conditions and demonstrating the generalization of the method across datasets. Overall, the proposed MAD dataset and S^3AD pipeline offer a practical, scalable path for semi-automatic orchard monitoring and yield-related applications.
Abstract
Crop detection is integral for precision agriculture applications such as automated yield estimation or fruit picking. However, crop detection, e.g., apple detection in orchard environments remains challenging due to a lack of large-scale datasets and the small relative size of the crops in the image. In this work, we address these challenges by reformulating the apple detection task in a semi-supervised manner. To this end, we provide the large, high-resolution dataset MAD comprising 105 labeled images with 14,667 annotated apple instances and 4,440 unlabeled images. Utilizing this dataset, we also propose a novel Semi-Supervised Small Apple Detection system S$^3$AD based on contextual attention and selective tiling to improve the challenging detection of small apples, while limiting the computational overhead. We conduct an extensive evaluation on MAD and the MSU dataset, showing that S$^3$AD substantially outperforms strong fully-supervised baselines, including several small object detection systems, by up to $14.9\%$. Additionally, we exploit the detailed annotations of our dataset w.r.t. apple properties to analyze the influence of relative size or level of occlusion on the results of various systems, quantifying current challenges.
