Table of Contents
Fetching ...

Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery

Mélisande Teng, Arthur Ouaknine, Etienne Laliberté, Yoshua Bengio, David Rolnick, Hugo Larochelle

TL;DR

Problem: automate tree crown instance segmentation from drone imagery to improve plantation monitoring. Approach: systematically assess the Segment Anything Model (SAM) in zero-shot and prompting configurations, compare against a task-tuned Mask R-CNN baseline, and extend SAM with DSM-informed prompting (RSPrompter, DSMPrompter). Contributions: a detailed dataset processing and splitting workflow on the UAV Quebec Plantations data; a comprehensive comparison showing SAM's limitations in its out-of-the-box form and demonstrating that DSM-augmented prompting can surpass traditional baselines, with DSMPrompter delivering the best overall results. Findings: DSMPrompter achieves the strongest performance, DSM input improves segmentation across methods, and task-specific tuning is essential for SAM to outperform specialized detectors. Significance: provides practical guidance for deploying drone-based tree monitoring and motivates future work on DSM integration and data-efficient prompting under low-label regimes.

Abstract

The potential of tree planting as a natural climate solution is often undermined by inadequate monitoring of tree planting projects. Current monitoring methods involve measuring trees by hand for each species, requiring extensive cost, time, and labour. Advances in drone remote sensing and computer vision offer great potential for mapping and characterizing trees from aerial imagery, and large pre-trained vision models, such as the Segment Anything Model (SAM), may be a particularly compelling choice given limited labeled data. In this work, we compare SAM methods for the task of automatic tree crown instance segmentation in high resolution drone imagery of young tree plantations. We explore the potential of SAM for this task, and find that methods using SAM out-of-the-box do not outperform a custom Mask R-CNN, even with well-designed prompts, but that there is potential for methods which tune SAM further. We also show that predictions can be improved by adding Digital Surface Model (DSM) information as an input.

Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery

TL;DR

Problem: automate tree crown instance segmentation from drone imagery to improve plantation monitoring. Approach: systematically assess the Segment Anything Model (SAM) in zero-shot and prompting configurations, compare against a task-tuned Mask R-CNN baseline, and extend SAM with DSM-informed prompting (RSPrompter, DSMPrompter). Contributions: a detailed dataset processing and splitting workflow on the UAV Quebec Plantations data; a comprehensive comparison showing SAM's limitations in its out-of-the-box form and demonstrating that DSM-augmented prompting can surpass traditional baselines, with DSMPrompter delivering the best overall results. Findings: DSMPrompter achieves the strongest performance, DSM input improves segmentation across methods, and task-specific tuning is essential for SAM to outperform specialized detectors. Significance: provides practical guidance for deploying drone-based tree monitoring and motivates future work on DSM integration and data-efficient prompting under low-label regimes.

Abstract

The potential of tree planting as a natural climate solution is often undermined by inadequate monitoring of tree planting projects. Current monitoring methods involve measuring trees by hand for each species, requiring extensive cost, time, and labour. Advances in drone remote sensing and computer vision offer great potential for mapping and characterizing trees from aerial imagery, and large pre-trained vision models, such as the Segment Anything Model (SAM), may be a particularly compelling choice given limited labeled data. In this work, we compare SAM methods for the task of automatic tree crown instance segmentation in high resolution drone imagery of young tree plantations. We explore the potential of SAM for this task, and find that methods using SAM out-of-the-box do not outperform a custom Mask R-CNN, even with well-designed prompts, but that there is potential for methods which tune SAM further. We also show that predictions can be improved by adding Digital Surface Model (DSM) information as an input.

Paper Structure

This paper contains 18 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Overview of our DSMPrompter method.
  • Figure 2: Per class mAP performance on the test set. For each model, the performance is averaged on 3 seeds. Tree species on the x-axis are ordered by decreasing prevalence in the dataset from left to right. Mask R-CNN is pretrained on ImageNet. Numerical results are provided in Appendix \ref{['appendix:results']}.
  • Figure 3: Overview of our SAM$+$DSM prompts method.
  • Figure 4: Examples of image, DSM and local maxima prompts (red dots) overlayed on the image.
  • Figure 5: Examples of image and prediction when the DSM is fed as a mask prompt to SAM.
  • ...and 4 more figures