Training point-based deep learning networks for forest segmentation with synthetic data
Francisco Raverta Capua, Juan Schandin, Pablo De Cristóforis
TL;DR
This paper tackles the lack of labeled forest point clouds for training deep learning segmentation models. It introduces an open-source Unity-based forest simulator that procedurally generates realistic LiDAR-like and camera-like synthetic datasets with ground-truth labels for terrain, trunk, canopy, and understorey. Four state-of-the-art point-based networks—PointNeXt, PointBERT, PointMAP, and PointGPT—are trained on the synthetic data and evaluated against the Evo forest dataset, revealing that synthetic data can generalize to real-world forests. The authors discuss data-type dependent performance and propose pre-training on synthetic data followed by fine-tuning on real data as a promising direction, with publicly available tools and datasets to promote reproducibility.
Abstract
Remote sensing through unmanned aerial systems (UAS) has been increasing in forestry in recent years, along with using machine learning for data processing. Deep learning architectures, extensively applied in natural language and image processing, have recently been extended to the point cloud domain. However, the availability of point cloud datasets for training and testing remains limited. Creating forested environment point cloud datasets is expensive, requires high-precision sensors, and is time-consuming as manual point classification is required. Moreover, forest areas could be inaccessible or dangerous for humans, further complicating data collection. Then, a question arises whether it is possible to use synthetic data to train deep learning networks without the need to rely on large volumes of real forest data. To answer this question, we developed a realistic simulator that procedurally generates synthetic forest scenes. Thanks to this, we have conducted a comparative study of different state-of-the-art point-based deep learning networks for forest segmentation. Using created datasets, we determined the feasibility of using synthetic data to train deep learning networks to classify point clouds from real forest datasets. Both the simulator and the datasets are released as part of this work.
