Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery
Mélisande Teng, Arthur Ouaknine, Etienne Laliberté, Yoshua Bengio, David Rolnick, Hugo Larochelle
TL;DR
Problem: automate tree crown instance segmentation from drone imagery to improve plantation monitoring. Approach: systematically assess the Segment Anything Model (SAM) in zero-shot and prompting configurations, compare against a task-tuned Mask R-CNN baseline, and extend SAM with DSM-informed prompting (RSPrompter, DSMPrompter). Contributions: a detailed dataset processing and splitting workflow on the UAV Quebec Plantations data; a comprehensive comparison showing SAM's limitations in its out-of-the-box form and demonstrating that DSM-augmented prompting can surpass traditional baselines, with DSMPrompter delivering the best overall results. Findings: DSMPrompter achieves the strongest performance, DSM input improves segmentation across methods, and task-specific tuning is essential for SAM to outperform specialized detectors. Significance: provides practical guidance for deploying drone-based tree monitoring and motivates future work on DSM integration and data-efficient prompting under low-label regimes.
Abstract
The potential of tree planting as a natural climate solution is often undermined by inadequate monitoring of tree planting projects. Current monitoring methods involve measuring trees by hand for each species, requiring extensive cost, time, and labour. Advances in drone remote sensing and computer vision offer great potential for mapping and characterizing trees from aerial imagery, and large pre-trained vision models, such as the Segment Anything Model (SAM), may be a particularly compelling choice given limited labeled data. In this work, we compare SAM methods for the task of automatic tree crown instance segmentation in high resolution drone imagery of young tree plantations. We explore the potential of SAM for this task, and find that methods using SAM out-of-the-box do not outperform a custom Mask R-CNN, even with well-designed prompts, but that there is potential for methods which tune SAM further. We also show that predictions can be improved by adding Digital Surface Model (DSM) information as an input.
