Coconut Palm Tree Counting on Drone Images with Deep Object Detection and Synthetic Training Data
Tobias Rohe, Barbara Böhm, Michael Kölle, Jonas Stein, Robert Müller, Claudia Linnhoff-Popien
TL;DR
This work tackles the problem of counting coconut palm trees from drone imagery in Ghana by fine-tuning YOLOv7 with synthetic training data generated from real palm cutouts and AI-generated backgrounds, addressing data scarcity. The study demonstrates that synthetic data can raise $mAP@0.5$ from $0.65$ to $0.88$, with the best model detecting $199$ palms on a test set of $187$ labeled trees and benefiting from green background textures and multi-class training. The approach supports a semi-automated, scalable workflow for farm management and yield planning, with potential extensions to health assessment and cloud-based deployment. Overall, the results illustrate a practical integration of deep object detection and synthetic data to reduce manual surveying and improve agricultural decision-making.
Abstract
Drones have revolutionized various domains, including agriculture. Recent advances in deep learning have propelled among other things object detection in computer vision. This study utilized YOLO, a real-time object detector, to identify and count coconut palm trees in Ghanaian farm drone footage. The farm presented has lost track of its trees due to different planting phases. While manual counting would be very tedious and error-prone, accurately determining the number of trees is crucial for efficient planning and management of agricultural processes, especially for optimizing yields and predicting production. We assessed YOLO for palm detection within a semi-automated framework, evaluated accuracy augmentations, and pondered its potential for farmers. Data was captured in September 2022 via drones. To optimize YOLO with scarce data, synthetic images were created for model training and validation. The YOLOv7 model, pretrained on the COCO dataset (excluding coconut palms), was adapted using tailored data. Trees from footage were repositioned on synthetic images, with testing on distinct authentic images. In our experiments, we adjusted hyperparameters, improving YOLO's mean average precision (mAP). We also tested various altitudes to determine the best drone height. From an initial mAP@.5 of $0.65$, we achieved 0.88, highlighting the value of synthetic images in agricultural scenarios.
