Table of Contents
Fetching ...

SilvaScenes: Tree Segmentation and Species Classification from Under-Canopy Images in Natural Forests

David-Alexandre Duclos, William Guimont-Martin, Gabriel Jeanson, Arthur Larochelle-Tremblay, Théo Defosse, Frédéric Moore, Philippe Nolet, François Pomerleau, Philippe Giguère

TL;DR

SilvaScenes presents a ground-level under-canopy dataset for tree instance segmentation and species classification in natural forests, addressing the gap in realistic, multi-species, under-canopy benchmarks. With 172 images, 1476 trees, and 24 species across five bioclimatic domains in Quebec, the dataset captures challenging conditions such as occlusion and variable lighting, and is annotated at the trunk level by forestry experts. Baseline experiments with YOLOv11/12 and Mask2Former (Swin backbones) reveal that while trunk segmentation is feasible, fine-grained species classification remains significantly harder (best $mAP$ = $35.69\%$), underscoring the need for higher-resolution data and richer metadata. The work provides a public dataset and baseline models to advance semantic perception for robotic forestry, with practical implications for automation, biodiversity monitoring, and SLAM-based data association in dense forests.

Abstract

Interest in robotics for forest management is growing, but perception in complex, natural environments remains a significant hurdle. Conditions such as heavy occlusion, variable lighting, and dense vegetation pose challenges to automated systems, which are essential for precision forestry, biodiversity monitoring, and the automation of forestry equipment. These tasks rely on advanced perceptual capabilities, such as detection and fine-grained species classification of individual trees. Yet, existing datasets are inadequate to develop such perception systems, as they often focus on urban settings or a limited number of species. To address this, we present SilvaScenes, a new dataset for instance segmentation of tree species from under-canopy images. Collected across five bioclimatic domains in Quebec, Canada, SilvaScenes features 1476 trees from 24 species with annotations from forestry experts. We demonstrate the relevance and challenging nature of our dataset by benchmarking modern deep learning approaches for instance segmentation. Our results show that, while tree segmentation is easy, with a top mean average precision (mAP) of 67.65%, species classification remains a significant challenge with an mAP of only 35.69%. Our dataset and source code will be available at https://github.com/norlab-ulaval/SilvaScenes.

SilvaScenes: Tree Segmentation and Species Classification from Under-Canopy Images in Natural Forests

TL;DR

SilvaScenes presents a ground-level under-canopy dataset for tree instance segmentation and species classification in natural forests, addressing the gap in realistic, multi-species, under-canopy benchmarks. With 172 images, 1476 trees, and 24 species across five bioclimatic domains in Quebec, the dataset captures challenging conditions such as occlusion and variable lighting, and is annotated at the trunk level by forestry experts. Baseline experiments with YOLOv11/12 and Mask2Former (Swin backbones) reveal that while trunk segmentation is feasible, fine-grained species classification remains significantly harder (best = ), underscoring the need for higher-resolution data and richer metadata. The work provides a public dataset and baseline models to advance semantic perception for robotic forestry, with practical implications for automation, biodiversity monitoring, and SLAM-based data association in dense forests.

Abstract

Interest in robotics for forest management is growing, but perception in complex, natural environments remains a significant hurdle. Conditions such as heavy occlusion, variable lighting, and dense vegetation pose challenges to automated systems, which are essential for precision forestry, biodiversity monitoring, and the automation of forestry equipment. These tasks rely on advanced perceptual capabilities, such as detection and fine-grained species classification of individual trees. Yet, existing datasets are inadequate to develop such perception systems, as they often focus on urban settings or a limited number of species. To address this, we present SilvaScenes, a new dataset for instance segmentation of tree species from under-canopy images. Collected across five bioclimatic domains in Quebec, Canada, SilvaScenes features 1476 trees from 24 species with annotations from forestry experts. We demonstrate the relevance and challenging nature of our dataset by benchmarking modern deep learning approaches for instance segmentation. Our results show that, while tree segmentation is easy, with a top mean average precision (mAP) of 67.65%, species classification remains a significant challenge with an mAP of only 35.69%. Our dataset and source code will be available at https://github.com/norlab-ulaval/SilvaScenes.

Paper Structure

This paper contains 19 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Example of an annotated image in our dataset, SilvaScenes. Instance segmentation masks are provided for tree trunks and are color-coded by species. The image illustrates the complex conditions frequently found in natural forests, such as occlusion and varying lighting.
  • Figure 2: Statistics of SilvaScenes. (a) Number of trees per image. (b) Number of species per image. (c) Log-scale distribution of tree width in our images.
  • Figure 3: Confusion matrix of M2F-Large over five folds. Results are row-normalized and expressed in percentages. Species are split into deciduous, coniferous, and Other, and grouped to highlight inter- and intra-genus confusion.
  • Figure 4: Examples of instance segmentation predictions with M2F-Large. Key discrepancies between our ground truth and the model's predictions are highlighted with ellipses.
  • Figure 5: Impact of image resolution on performance of M2F-Large. The notation A$\xrightarrow{}$B signifies that the model was trained on task A and evaluated on task B. Bands show the IQR over five folds. Note that the image resolution is in log-scale.