Table of Contents
Fetching ...

Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation

Bianca-Cerasela-Zelia Blaga, Sergiu Nedevschi

TL;DR

This work addresses the lack of large, annotated forestry data for UAV-driven semantic segmentation and depth estimation by introducing the Forest Inspection Dataset, which blends real WildUAV imagery with a synthetic AirSim-based forest collection containing dense semantic labels and depth. It benchmarks two multi-scale architectures, HRNet and PointFlow, and demonstrates that training on a diverse, multi-condition dataset plus synthetic pretraining yields better generalization and boundary accuracy, with transfer learning effectively bridging the synthetic-real gap. A deforestation assessment framework is developed by constructing semantically labeled 3D point clouds and quantifying healthy versus deforested areas over time, highlighting practical applications in forest monitoring and autonomous navigation. The dataset and findings advance forest surveillance capabilities and provide a foundation for integrating depth information into aerial scene understanding.

Abstract

Humans use UAVs to monitor changes in forest environments since they are lightweight and provide a large variety of surveillance data. However, their information does not present enough details for understanding the scene which is needed to assess the degree of deforestation. Deep learning algorithms must be trained on large amounts of data to output accurate interpretations, but ground truth recordings of annotated forest imagery are not available. To solve this problem, we introduce a new large aerial dataset for forest inspection which contains both real-world and virtual recordings of natural environments, with densely annotated semantic segmentation labels and depth maps, taken in different illumination conditions, at various altitudes and recording angles. We test the performance of two multi-scale neural networks for solving the semantic segmentation task (HRNet and PointFlow network), studying the impact of the various acquisition conditions and the capabilities of transfer learning from virtual to real data. Our results showcase that the best results are obtained when the training is done on a dataset containing a large variety of scenarios, rather than separating the data into specific categories. We also develop a framework to assess the deforestation degree of an area.

Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation

TL;DR

This work addresses the lack of large, annotated forestry data for UAV-driven semantic segmentation and depth estimation by introducing the Forest Inspection Dataset, which blends real WildUAV imagery with a synthetic AirSim-based forest collection containing dense semantic labels and depth. It benchmarks two multi-scale architectures, HRNet and PointFlow, and demonstrates that training on a diverse, multi-condition dataset plus synthetic pretraining yields better generalization and boundary accuracy, with transfer learning effectively bridging the synthetic-real gap. A deforestation assessment framework is developed by constructing semantically labeled 3D point clouds and quantifying healthy versus deforested areas over time, highlighting practical applications in forest monitoring and autonomous navigation. The dataset and findings advance forest surveillance capabilities and provide a foundation for integrating depth information into aerial scene understanding.

Abstract

Humans use UAVs to monitor changes in forest environments since they are lightweight and provide a large variety of surveillance data. However, their information does not present enough details for understanding the scene which is needed to assess the degree of deforestation. Deep learning algorithms must be trained on large amounts of data to output accurate interpretations, but ground truth recordings of annotated forest imagery are not available. To solve this problem, we introduce a new large aerial dataset for forest inspection which contains both real-world and virtual recordings of natural environments, with densely annotated semantic segmentation labels and depth maps, taken in different illumination conditions, at various altitudes and recording angles. We test the performance of two multi-scale neural networks for solving the semantic segmentation task (HRNet and PointFlow network), studying the impact of the various acquisition conditions and the capabilities of transfer learning from virtual to real data. Our results showcase that the best results are obtained when the training is done on a dataset containing a large variety of scenarios, rather than separating the data into specific categories. We also develop a framework to assess the deforestation degree of an area.
Paper Structure (18 sections, 11 equations, 12 figures, 8 tables)

This paper contains 18 sections, 11 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Semantic segmentation labels overimposed on the camera recording from WildUAV.
  • Figure 2: Example of color, semantic labels, and depth map from the synthetic dataset.
  • Figure 3: Distribution of semantic classes in the WildUAV real dataset.
  • Figure 4: Pairs of manually annotated images from WildUAV, showcasing the large variability of terrain and class labeling.
  • Figure 5: Distribution of semantic classes in the Forest Inspection synthetic set.
  • ...and 7 more figures