Detection and Geographic Localization of Natural Objects in the Wild: A Case Study on Palms
Kangning Cui, Rongkun Zhu, Manqi Wang, Wei Tang, Gregory D. Larsen, Victor P. Pauca, Sarra Alqahtani, Fan Yang, David Segurado, David Lutz, Jean-Michel Morel, Miles R. Silman
TL;DR
This work tackles automated detection and geolocation of naturally occurring palms in dense tropical forests using large orthomosaic UAV imagery. It introduces PRISM, a modular pipeline that combines YOLOv10-based detection with SAM 2 zero-shot segmentation to produce georeferenced palm centers and masks, augmented by calibration and interpretability tools. The PALMS dataset, covering 21 sites and comprising 8,830 bounding boxes and 5,026 center points, provides a robust benchmark for cross-site generalization and method comparison. PRISM demonstrates real-time capable detection on mid-range hardware, strong localization under distribution shifts, and actionable outputs for ecological monitoring and biodiversity assessment, with clear pathways to adapt to other forest canopies and coarser-resolution imagery.
Abstract
Palms are ecologically and economically indicators of tropical forest health, biodiversity, and human impact that support local economies and global forest product supply chains. While palm detection in plantations is well-studied, efforts to map naturally occurring palms in dense forests remain limited by overlapping crowns, uneven shading, and heterogeneous landscapes. We develop PRISM (Processing, Inference, Segmentation, and Mapping), a flexible pipeline for detecting and localizing palms in dense tropical forests using large orthomosaic images. Orthomosaics are created from thousands of aerial images and spanning several to hundreds of gigabytes. Our contributions are threefold. First, we construct a large UAV-derived orthomosaic dataset collected across 21 ecologically diverse sites in western Ecuador, annotated with 8,830 bounding boxes and 5,026 palm center points. Second, we evaluate multiple state-of-the-art object detectors based on efficiency and performance, integrating zero-shot SAM 2 as the segmentation backbone, and refining the results for precise geographic mapping. Third, we apply calibration methods to align confidence scores with IoU and explore saliency maps for feature explainability. Though optimized for palms, PRISM is adaptable for identifying other natural objects, such as eastern white pines. Future work will explore transfer learning for lower-resolution datasets (0.5 to 1m).
