AppleGrowthVision: A large-scale stereo dataset for phenological analysis, fruit detection, and 3D reconstruction in apple orchards
Laura-Sophia von Hirschhausen, Jannes S. Magnusson, Mykyta Kovalenko, Fredrik Boye, Tanay Rawat, Peter Eisert, Anna Hilsmann, Sebastian Pretzsch, Sebastian Bosse
TL;DR
AppleGrowthVision introduces a large-scale, publicly available stereo orchard dataset with agriculturally validated BBCH growth-stage annotations, spanning a full growth cycle and two German sites. The dataset comprises 9,317 stereo pairs across six BBCH stages and 1,125 densely annotated images with 31,084 apple labels, enabling precise phenological analysis and 3D orchard reconstruction. Empirical evaluations show that incorporating AppleGrowthVision with existing datasets improves apple detection performance for YOLOv8 and Faster R-CNN, and that principal BBCH stages can be classified with over 95% accuracy using standard CNN backbones. The work demonstrates strong potential for robust fruit detection, growth modeling, and 3D reconstruction in precision agriculture, while highlighting challenges in annotation automation and the need for multimodal benchmarks and transformer-based baselines.
Abstract
Deep learning has transformed computer vision for precision agriculture, yet apple orchard monitoring remains limited by dataset constraints. The lack of diverse, realistic datasets and the difficulty of annotating dense, heterogeneous scenes. Existing datasets overlook different growth stages and stereo imagery, both essential for realistic 3D modeling of orchards and tasks like fruit localization, yield estimation, and structural analysis. To address these gaps, we present AppleGrowthVision, a large-scale dataset comprising two subsets. The first includes 9,317 high resolution stereo images collected from a farm in Brandenburg (Germany), covering six agriculturally validated growth stages over a full growth cycle. The second subset consists of 1,125 densely annotated images from the same farm in Brandenburg and one in Pillnitz (Germany), containing a total of 31,084 apple labels. AppleGrowthVision provides stereo-image data with agriculturally validated growth stages, enabling precise phenological analysis and 3D reconstructions. Extending MinneApple with our data improves YOLOv8 performance by 7.69 % in terms of F1-score, while adding it to MinneApple and MAD boosts Faster R-CNN F1-score by 31.06 %. Additionally, six BBCH stages were predicted with over 95 % accuracy using VGG16, ResNet152, DenseNet201, and MobileNetv2. AppleGrowthVision bridges the gap between agricultural science and computer vision, by enabling the development of robust models for fruit detection, growth modeling, and 3D analysis in precision agriculture. Future work includes improving annotation, enhancing 3D reconstruction, and extending multimodal analysis across all growth stages.
