Table of Contents
Fetching ...

Manual Labelling Artificially Inflates Deep Learning-Based Segmentation Performance on RGB Images of Closed Canopy: Validation Using TLS

Matthew J. Allen, Harry J. F. Owen, Stuart W. D. Grieve, Emily R. Lines

TL;DR

This paper investigates whether pretrained RGB-based crown segmentation models can reliably delineate individual trees in closed-canopy forests when evaluated against high-fidelity ground truth from co-located TLS data. It compares two popular pretrained models, DeepForest and Detectree2, across boreal and Mediterranean forests, using both TLS-derived and manual ground truth without retraining. The results show dramatically lower performance with TLS-ground truth than with hand-labelled data, especially at stricter IoU thresholds, and only modest gains when focusing on canopy trees. The study highlights fundamental limitations of aerial RGB segmentation in closed canopies and emphasizes the need for independent ground truth for reliable deployment in forest monitoring and inventory work.

Abstract

Monitoring forest dynamics at an individual tree scale is essential for accurately assessing ecosystem responses to climate change, yet traditional methods relying on field-based forest inventories are labor-intensive and limited in spatial coverage. Advances in remote sensing using drone-acquired RGB imagery combined with deep learning models have promised precise individual tree crown (ITC) segmentation; however, existing methods are frequently validated against human-annotated images, lacking rigorous independent ground truth. In this study, we generate high-fidelity validation labels from co-located Terrestrial Laser Scanning (TLS) data for drone imagery of mixed unmanaged boreal and Mediterranean forests. We evaluate the performance of two widely used deep learning ITC segmentation models - DeepForest (RetinaNet) and Detectree2 (Mask R-CNN) - on these data, and compare to performance on further Mediterranean forest data labelled manually. When validated against TLS-derived ground truth from Mediterranean forests, model performance decreased significantly compared to assessment based on hand-labelled from an ecologically similar site (AP50: 0.094 vs. 0.670). Restricting evaluation to only canopy trees shrank this gap considerably (Canopy AP50: 0.365), although performance was still far lower than on similar hand-labelled data. Models also performed poorly on boreal forest data (AP50: 0.142), although again increasing when evaluated on canopy trees only (Canopy AP50: 0.308). Both models showed very poor localisation accuracy at stricter IoU thresholds, even when restricted to canopy trees (Max AP75: 0.051). Similar results have been observed in studies using aerial LiDAR data, suggesting fundamental limitations in aerial-based segmentation approaches in closed canopy forests.

Manual Labelling Artificially Inflates Deep Learning-Based Segmentation Performance on RGB Images of Closed Canopy: Validation Using TLS

TL;DR

This paper investigates whether pretrained RGB-based crown segmentation models can reliably delineate individual trees in closed-canopy forests when evaluated against high-fidelity ground truth from co-located TLS data. It compares two popular pretrained models, DeepForest and Detectree2, across boreal and Mediterranean forests, using both TLS-derived and manual ground truth without retraining. The results show dramatically lower performance with TLS-ground truth than with hand-labelled data, especially at stricter IoU thresholds, and only modest gains when focusing on canopy trees. The study highlights fundamental limitations of aerial RGB segmentation in closed canopies and emphasizes the need for independent ground truth for reliable deployment in forest monitoring and inventory work.

Abstract

Monitoring forest dynamics at an individual tree scale is essential for accurately assessing ecosystem responses to climate change, yet traditional methods relying on field-based forest inventories are labor-intensive and limited in spatial coverage. Advances in remote sensing using drone-acquired RGB imagery combined with deep learning models have promised precise individual tree crown (ITC) segmentation; however, existing methods are frequently validated against human-annotated images, lacking rigorous independent ground truth. In this study, we generate high-fidelity validation labels from co-located Terrestrial Laser Scanning (TLS) data for drone imagery of mixed unmanaged boreal and Mediterranean forests. We evaluate the performance of two widely used deep learning ITC segmentation models - DeepForest (RetinaNet) and Detectree2 (Mask R-CNN) - on these data, and compare to performance on further Mediterranean forest data labelled manually. When validated against TLS-derived ground truth from Mediterranean forests, model performance decreased significantly compared to assessment based on hand-labelled from an ecologically similar site (AP50: 0.094 vs. 0.670). Restricting evaluation to only canopy trees shrank this gap considerably (Canopy AP50: 0.365), although performance was still far lower than on similar hand-labelled data. Models also performed poorly on boreal forest data (AP50: 0.142), although again increasing when evaluated on canopy trees only (Canopy AP50: 0.308). Both models showed very poor localisation accuracy at stricter IoU thresholds, even when restricted to canopy trees (Max AP75: 0.051). Similar results have been observed in studies using aerial LiDAR data, suggesting fundamental limitations in aerial-based segmentation approaches in closed canopy forests.

Paper Structure

This paper contains 20 sections, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Precision-recall curves for (top) Deepforest and (bottom) Detectree2 at the best hyperparameters for each site. Performance was assessed at IoU thresholds of 0.5 (red lines) and 0.75 (blue lines). Precision-recall curves obtained when scoring against canopy trees only are shown using dashed lines for both IoU thresholds.
  • Figure 2: (Top) Image data with (Bottom) accompanying TLS-derived labels (Hashed, red) and Detectree2 predictions (Outlined, green) using the best hyperparameters. Several common sources of error are shown. (A) Large canopy tree segmented correctly. (B1-3) Canopy trees where the correct delineation is visually ambiguous - (B1,2) depict tightly grouped individual trees. (B3) is visually similar from above but is a single tree. (C) Sub-canopy trees that might be possible to predict but are generally ommitted during hand-labelling. (D) Sub-canopy trees that are nominally visible from above but practically invisible due to shadowing or orthomosaic artefacting.
  • Figure 3: Example of minor misalignment between TLS and image data. (Left) Raw imagery of a single tree. (Middle) Output polygon from Algorithm \ref{['alg:tls_pipeline']}, with bounding box added for illustration. Ground truth polygons removed for surrounding trees to reduce visual clutter. Note cropping due to plot-edge effects. Predictions were also cropped to the extent of the plot after all other post-processing to eliminate the influence of plot-edge effects on evaluation. (Right). Manually corrected polygon and bounding box (blue) overlaid on originals (red). The minor misalignment of individual branches causes a more pronounced shift in the full polygon than the bounding box. We evaluated segmentation using bounding boxes only.
  • Figure 4: Gridsearch results for DeepForest on Almorox (Manual labelling), Alto Tajo (TLS Labelling) and Joensuu (TLS Labelling). (Top row) Overall performance (Bottom row) Canopy performance. Top value within each cell denotes AP measured at an IoU of 0.5, and bottom values at 0.75. Cells coloured by AP50 for both overall and canopy, and best result outlined in green.
  • Figure 5: Gridsearch results for Detectree on Almorox (Manual labelling), Alto Tajo (TLS Labelling) and Joensuu (TLS Labelling). (Top row) Overall performance. (Bottom row) Canopy performance. Top value within each cell denotes AP measured at an IoU of 0.5, and bottom values at 0.75. Cells coloured by AP50 for both overall and canopy, and best result outlined in green.