Table of Contents
Fetching ...

OAM-TCD: A globally diverse dataset of high-resolution tree cover maps

Josh Veitch-Michaelis, Andrew Cottam, Daniella Schweizer, Eben N. Broadbent, David Dao, Ce Zhang, Angelica Almeyda Zambrano, Simeon Max

TL;DR

OAM-TCD introduces a globally diverse, high-resolution dataset for tree crown delineation with instance-level annotations, addressing the scarcity of open, sub-meter tree mapping data. It provides 5072 2048×2048 px tiles at 10 cm/px, containing over 280k individual trees and 56k canopy groups, along with baseline semantic and instance segmentation models and an open-source processing pipeline. The authors evaluate models via biome-stratified k-fold cross-validation and holdout tests, showing SegFormer generally outperforming UNet for semantic segmentation and Mask-RCNN delivering competitive instance segmentation results, with validation on independent data from Zurich and Tonga. They also discuss licensing, data diversity, and limitations, including biases and underrepresented biomes, while outlining future work to expand biome coverage, improve annotation consistency, and enable foundation-model benchmarking for global tree detection.

Abstract

Accurately quantifying tree cover is an important metric for ecosystem monitoring and for assessing progress in restored sites. Recent works have shown that deep learning-based segmentation algorithms are capable of accurately mapping trees at country and continental scales using high-resolution aerial and satellite imagery. Mapping at high (ideally sub-meter) resolution is necessary to identify individual trees, however there are few open-access datasets containing instance level annotations and those that exist are small or not geographically diverse. We present a novel open-access dataset for individual tree crown delineation (TCD) in high-resolution aerial imagery sourced from OpenAerialMap (OAM). Our dataset, OAM-TCD, comprises 5072 2048x2048 px images at 10 cm/px resolution with associated human-labeled instance masks for over 280k individual and 56k groups of trees. By sampling imagery from around the world, we are able to better capture the diversity and morphology of trees in different terrestrial biomes and in both urban and natural environments. Using our dataset, we train reference instance and semantic segmentation models that compare favorably to existing state-of-the-art models. We assess performance through k-fold cross-validation and comparison with existing datasets; additionally we demonstrate compelling results on independent aerial imagery captured over Switzerland and compare to municipal tree inventories and LIDAR-derived canopy maps in the city of Zurich. Our dataset, models and training/benchmark code are publicly released under permissive open-source licenses: Creative Commons (majority CC BY 4.0), and Apache 2.0 respectively.

OAM-TCD: A globally diverse dataset of high-resolution tree cover maps

TL;DR

OAM-TCD introduces a globally diverse, high-resolution dataset for tree crown delineation with instance-level annotations, addressing the scarcity of open, sub-meter tree mapping data. It provides 5072 2048×2048 px tiles at 10 cm/px, containing over 280k individual trees and 56k canopy groups, along with baseline semantic and instance segmentation models and an open-source processing pipeline. The authors evaluate models via biome-stratified k-fold cross-validation and holdout tests, showing SegFormer generally outperforming UNet for semantic segmentation and Mask-RCNN delivering competitive instance segmentation results, with validation on independent data from Zurich and Tonga. They also discuss licensing, data diversity, and limitations, including biases and underrepresented biomes, while outlining future work to expand biome coverage, improve annotation consistency, and enable foundation-model benchmarking for global tree detection.

Abstract

Accurately quantifying tree cover is an important metric for ecosystem monitoring and for assessing progress in restored sites. Recent works have shown that deep learning-based segmentation algorithms are capable of accurately mapping trees at country and continental scales using high-resolution aerial and satellite imagery. Mapping at high (ideally sub-meter) resolution is necessary to identify individual trees, however there are few open-access datasets containing instance level annotations and those that exist are small or not geographically diverse. We present a novel open-access dataset for individual tree crown delineation (TCD) in high-resolution aerial imagery sourced from OpenAerialMap (OAM). Our dataset, OAM-TCD, comprises 5072 2048x2048 px images at 10 cm/px resolution with associated human-labeled instance masks for over 280k individual and 56k groups of trees. By sampling imagery from around the world, we are able to better capture the diversity and morphology of trees in different terrestrial biomes and in both urban and natural environments. Using our dataset, we train reference instance and semantic segmentation models that compare favorably to existing state-of-the-art models. We assess performance through k-fold cross-validation and comparison with existing datasets; additionally we demonstrate compelling results on independent aerial imagery captured over Switzerland and compare to municipal tree inventories and LIDAR-derived canopy maps in the city of Zurich. Our dataset, models and training/benchmark code are publicly released under permissive open-source licenses: Creative Commons (majority CC BY 4.0), and Apache 2.0 respectively.
Paper Structure (70 sections, 5 figures, 4 tables)

This paper contains 70 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Annotation examples from the OAM-TCD dataset. Source images are 2048x2048 px tiles at 10 cm/px resolution. Individual instances are labelled with different colors. Annotators were instructed to label individual trees if possible, and otherwise label regions as groups. Image credit: contributors to Open Imagery Network, CC BY 4.0.
  • Figure 2: Geospatial distribution of imagery in the dataset. It is clear that some locations are under-represented, but among open-access data, we believe OAM-TCD is the most geographically diverse of its type. The lack of imagery from some regions is due to inherent biases in the data that are uploaded to OAM.
  • Figure 3: Tree semantic segmentation for Zurich, predicted at 10 cm/px. Predictions with a confidence of < 0.4 are hidden. Left: 10 cm RGB orthomosaic provided by the Swiss Federal Office of Topography swisstopo/SWISSIMAGE 10 cm (2022), Right: prediction heat map. Zooming in is recommended to see small details, e.g. trees along the top edge of the lake. Base map tiles by Stamen Design, under CC BY 4.0. Data by OpenStreetMap, under ODbL.
  • Figure 4: Further example annotations from the OAM-TCD test split. Left: RGB image, Middle: ground truth segmentation randomly coloured by segment ID, Right: coloured by class - blue = tree, orange = tree canopy. All images licensed CC BY 4.0, contributors to Open Imagery Network, top-bottom OAM-TCD IDs: 555,1445,1594,2242.
  • Figure 5: Semantic segmentation predictions for the WeRobotics Open AI challenge image over the Kingdom of Tonga, using the restor/tcd-segformer-mit-b5 model. Individual palm trees are clearly segmented. Some uncertain predictions are visible in the lower region of the image near the coast - identifiable as missing/inconsistent tiles.