Table of Contents
Fetching ...

Advancing ALS Applications with Large-Scale Pre-training: Dataset Development and Downstream Assessment

Haoyi Xiu, Xin Liu, Taehoon Kim, Kyoung-Sook Kim

TL;DR

This work tackles the lack of large-scale pre-training data for airborne LiDAR (ALS) by constructing a continental-scale ALS tile dataset from USGS 3DEP using a geospatial sampling strategy guided by NLCD land cover and DEM-derived slope. The authors adopt BEV-MAE, a masked autoencoder for 3D outdoor point clouds, to pre-train on this dataset and fine-tune on tree species classification, terrain scene recognition, and point cloud semantic segmentation. Results show that pre-training yields consistent improvements over scratch baselines for tree species and terrain tasks, with dataset scale and sampling strategy driving performance gains, while urban segmentation benefits are more modest. The study demonstrates the transferability of representations learned from the proposed ALS dataset and highlights directions for enhancing reconstruction richness and developing ALS-specific SSL methods, with code and models released to the community.

Abstract

The pre-training and fine-tuning paradigm has revolutionized satellite remote sensing applications. However, this approach remains largely underexplored for airborne laser scanning (ALS), an important technology for applications such as forest management and urban planning. In this study, we address this gap by constructing a large-scale ALS point cloud dataset and evaluating its impact on downstream applications. Our dataset comprises ALS point clouds collected across the contiguous United States, provided by the United States Geological Survey's 3D Elevation Program. To ensure efficient data collection while capturing diverse land cover and terrain types, we introduce a geospatial sampling method that selects point cloud tiles based on land cover maps and digital elevation models. As a baseline self-supervised learning model, we adopt BEV-MAE, a state-of-the-art masked autoencoder for 3D outdoor point clouds, and pre-train it on the constructed dataset. The pre-trained models are subsequently fine-tuned for downstream tasks, including tree species classification, terrain scene recognition, and point cloud semantic segmentation. Our results show that the pre-trained models significantly outperform their scratch counterparts across all downstream tasks, demonstrating the transferability of the representations learned from the proposed dataset. Furthermore, we observe that scaling the dataset using our geospatial sampling method consistently enhances performance, whereas pre-training on datasets constructed with random sampling fails to achieve similar improvements. These findings highlight the utility of the constructed dataset and the effectiveness of our sampling strategy in the pre-training and fine-tuning paradigm. The source code and pre-trained models will be made publicly available at \url{https://github.com/martianxiu/ALS_pretraining}.

Advancing ALS Applications with Large-Scale Pre-training: Dataset Development and Downstream Assessment

TL;DR

This work tackles the lack of large-scale pre-training data for airborne LiDAR (ALS) by constructing a continental-scale ALS tile dataset from USGS 3DEP using a geospatial sampling strategy guided by NLCD land cover and DEM-derived slope. The authors adopt BEV-MAE, a masked autoencoder for 3D outdoor point clouds, to pre-train on this dataset and fine-tune on tree species classification, terrain scene recognition, and point cloud semantic segmentation. Results show that pre-training yields consistent improvements over scratch baselines for tree species and terrain tasks, with dataset scale and sampling strategy driving performance gains, while urban segmentation benefits are more modest. The study demonstrates the transferability of representations learned from the proposed ALS dataset and highlights directions for enhancing reconstruction richness and developing ALS-specific SSL methods, with code and models released to the community.

Abstract

The pre-training and fine-tuning paradigm has revolutionized satellite remote sensing applications. However, this approach remains largely underexplored for airborne laser scanning (ALS), an important technology for applications such as forest management and urban planning. In this study, we address this gap by constructing a large-scale ALS point cloud dataset and evaluating its impact on downstream applications. Our dataset comprises ALS point clouds collected across the contiguous United States, provided by the United States Geological Survey's 3D Elevation Program. To ensure efficient data collection while capturing diverse land cover and terrain types, we introduce a geospatial sampling method that selects point cloud tiles based on land cover maps and digital elevation models. As a baseline self-supervised learning model, we adopt BEV-MAE, a state-of-the-art masked autoencoder for 3D outdoor point clouds, and pre-train it on the constructed dataset. The pre-trained models are subsequently fine-tuned for downstream tasks, including tree species classification, terrain scene recognition, and point cloud semantic segmentation. Our results show that the pre-trained models significantly outperform their scratch counterparts across all downstream tasks, demonstrating the transferability of the representations learned from the proposed dataset. Furthermore, we observe that scaling the dataset using our geospatial sampling method consistently enhances performance, whereas pre-training on datasets constructed with random sampling fails to achieve similar improvements. These findings highlight the utility of the constructed dataset and the effectiveness of our sampling strategy in the pre-training and fine-tuning paradigm. The source code and pre-trained models will be made publicly available at \url{https://github.com/martianxiu/ALS_pretraining}.
Paper Structure (28 sections, 3 equations, 8 figures, 13 tables)

This paper contains 28 sections, 3 equations, 8 figures, 13 tables.

Figures (8)

  • Figure 1: Overview of the dataset development procedure: Land cover data, DEM, and point cloud boundaries are used to selectively download point cloud tiles from a remote server provided by 3DEP. The point clouds are visualized with elevation-based coloring, where cooler colors represent lower elevations and warmer colors indicate higher elevations.
  • Figure 2: LiDAR point cloud boundaries used in this study are shown with randomly assigned colors for the boundary polygons. The boundary data were downloaded from usgs_3dep_lidar_aws on June 27, 2024.
  • Figure 3: The upper left figure displays the land cover map derived from the Anderson Level 1 classification system, while the upper right figure shows the slope derived from the DEM. The lower left figure presents the slope classification map, and the lower right figure illustrates the locations of the sampled tiles based on our sampling strategy.
  • Figure 4: Random samples of the dataset. Top: point cloud tiles labeled as "Developed". Bottom: point cloud tiles labeled as "Forest". From left to right: point cloud tiles labeled as "Flat", "Sloping", "Steep".
  • Figure 5: Overview of the pre-training and fine-tuning using BEV-MAE.
  • ...and 3 more figures