ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation
Iaroslav Melekhov, Anand Umashankar, Hyeong-Jin Kim, Vladislav Serkov, Dusty Argyle
TL;DR
ECLAIR addresses the shortage of large-scale outdoor aerial LiDAR datasets for 3D semantic segmentation by providing a high-fidelity, colorized point-cloud dataset over >$10km^2$ with 11 classes, ground-truth and pseudo-label annotations, and a Minkowski Engine baseline evaluation. The paper details a full data creation pipeline, including helicopter-based capture, tiling, colorization, and manual quality control, plus a thorough ablation study on features, losses, and architectures. Key findings show that combining intensity with return features and using Focal loss improves segmentation, while pseudo-labels enhance generalization; the Res16UNet14C backbone achieves strong per-class performance. This dataset enables scalable benchmarking for urban infrastructure and utility management and paves the way for future extensions to instance segmentation and larger-area coverage.
Abstract
We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation. As the most extensive and diverse collection of its kind to date, the dataset covers a total area of 10$km^2$ with close to 600 million points and features eleven distinct object categories. To guarantee the dataset's quality and utility, we have thoroughly curated the point labels through an internal team of experts, ensuring accuracy and consistency in semantic labeling. The dataset is engineered to move forward the fields of 3D urban modeling, scene understanding, and utility infrastructure management by presenting new challenges and potential applications. As a benchmark, we report qualitative and quantitative analysis of a voxel-based point cloud segmentation approach based on the Minkowski Engine.
