Table of Contents
Fetching ...

Detection and Geographic Localization of Natural Objects in the Wild: A Case Study on Palms

Kangning Cui, Rongkun Zhu, Manqi Wang, Wei Tang, Gregory D. Larsen, Victor P. Pauca, Sarra Alqahtani, Fan Yang, David Segurado, David Lutz, Jean-Michel Morel, Miles R. Silman

TL;DR

This work tackles automated detection and geolocation of naturally occurring palms in dense tropical forests using large orthomosaic UAV imagery. It introduces PRISM, a modular pipeline that combines YOLOv10-based detection with SAM 2 zero-shot segmentation to produce georeferenced palm centers and masks, augmented by calibration and interpretability tools. The PALMS dataset, covering 21 sites and comprising 8,830 bounding boxes and 5,026 center points, provides a robust benchmark for cross-site generalization and method comparison. PRISM demonstrates real-time capable detection on mid-range hardware, strong localization under distribution shifts, and actionable outputs for ecological monitoring and biodiversity assessment, with clear pathways to adapt to other forest canopies and coarser-resolution imagery.

Abstract

Palms are ecologically and economically indicators of tropical forest health, biodiversity, and human impact that support local economies and global forest product supply chains. While palm detection in plantations is well-studied, efforts to map naturally occurring palms in dense forests remain limited by overlapping crowns, uneven shading, and heterogeneous landscapes. We develop PRISM (Processing, Inference, Segmentation, and Mapping), a flexible pipeline for detecting and localizing palms in dense tropical forests using large orthomosaic images. Orthomosaics are created from thousands of aerial images and spanning several to hundreds of gigabytes. Our contributions are threefold. First, we construct a large UAV-derived orthomosaic dataset collected across 21 ecologically diverse sites in western Ecuador, annotated with 8,830 bounding boxes and 5,026 palm center points. Second, we evaluate multiple state-of-the-art object detectors based on efficiency and performance, integrating zero-shot SAM 2 as the segmentation backbone, and refining the results for precise geographic mapping. Third, we apply calibration methods to align confidence scores with IoU and explore saliency maps for feature explainability. Though optimized for palms, PRISM is adaptable for identifying other natural objects, such as eastern white pines. Future work will explore transfer learning for lower-resolution datasets (0.5 to 1m).

Detection and Geographic Localization of Natural Objects in the Wild: A Case Study on Palms

TL;DR

This work tackles automated detection and geolocation of naturally occurring palms in dense tropical forests using large orthomosaic UAV imagery. It introduces PRISM, a modular pipeline that combines YOLOv10-based detection with SAM 2 zero-shot segmentation to produce georeferenced palm centers and masks, augmented by calibration and interpretability tools. The PALMS dataset, covering 21 sites and comprising 8,830 bounding boxes and 5,026 center points, provides a robust benchmark for cross-site generalization and method comparison. PRISM demonstrates real-time capable detection on mid-range hardware, strong localization under distribution shifts, and actionable outputs for ecological monitoring and biodiversity assessment, with clear pathways to adapt to other forest canopies and coarser-resolution imagery.

Abstract

Palms are ecologically and economically indicators of tropical forest health, biodiversity, and human impact that support local economies and global forest product supply chains. While palm detection in plantations is well-studied, efforts to map naturally occurring palms in dense forests remain limited by overlapping crowns, uneven shading, and heterogeneous landscapes. We develop PRISM (Processing, Inference, Segmentation, and Mapping), a flexible pipeline for detecting and localizing palms in dense tropical forests using large orthomosaic images. Orthomosaics are created from thousands of aerial images and spanning several to hundreds of gigabytes. Our contributions are threefold. First, we construct a large UAV-derived orthomosaic dataset collected across 21 ecologically diverse sites in western Ecuador, annotated with 8,830 bounding boxes and 5,026 palm center points. Second, we evaluate multiple state-of-the-art object detectors based on efficiency and performance, integrating zero-shot SAM 2 as the segmentation backbone, and refining the results for precise geographic mapping. Third, we apply calibration methods to align confidence scores with IoU and explore saliency maps for feature explainability. Though optimized for palms, PRISM is adaptable for identifying other natural objects, such as eastern white pines. Future work will explore transfer learning for lower-resolution datasets (0.5 to 1m).

Paper Structure

This paper contains 20 sections, 2 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Palm Distribution Comparison. The first three images from previous studies gibril2021deepjintasuttisak2022deep feature evenly spaced palms or clear backgrounds, while the last represents our case with natural spacing, occlusions, and complex backgrounds in tropical forests.
  • Figure 2: Geographic Locations of Study Sites. The left panel shows a map of Ecuador with red stars marking the study regions. The right panels zoom in on 21 study areas within four ecological sites.
  • Figure 3: PRISM Pipeline Overview. The detection model, trained on the PALMS dataset, processes orthomosaic slices to generate confidence scores and bounding boxes. These bounding boxes are refined and serve as prompts, along with the sliced input images, for zero-shot segmentation. The bounding boxes and confidence scores are further utilized for saliency map generation and calibration analysis.
  • Figure 4: Comparison of Palm Detection Performance. Several models are compared in detecting palms, including small, occluded, and boundary-adjacent cases. All models perform well on large palms, even in occluded scenarios. DETR-based models excel at detecting small palms, while YOLO-based models perform better for partially visible palms on boundaries.
  • Figure 5: Zero-shot segmentation of SAM variants under distribution shifts. Rows correspond to SAM variants, while columns represent four distinct reserves. The box prompts were derived from the detection model trained on geographically distinct data.
  • ...and 3 more figures