ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation

Iaroslav Melekhov; Anand Umashankar; Hyeong-Jin Kim; Vladislav Serkov; Dusty Argyle

ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation

Iaroslav Melekhov, Anand Umashankar, Hyeong-Jin Kim, Vladislav Serkov, Dusty Argyle

TL;DR

ECLAIR addresses the shortage of large-scale outdoor aerial LiDAR datasets for 3D semantic segmentation by providing a high-fidelity, colorized point-cloud dataset over >$10km^2$ with 11 classes, ground-truth and pseudo-label annotations, and a Minkowski Engine baseline evaluation. The paper details a full data creation pipeline, including helicopter-based capture, tiling, colorization, and manual quality control, plus a thorough ablation study on features, losses, and architectures. Key findings show that combining intensity with return features and using Focal loss improves segmentation, while pseudo-labels enhance generalization; the Res16UNet14C backbone achieves strong per-class performance. This dataset enables scalable benchmarking for urban infrastructure and utility management and paves the way for future extensions to instance segmentation and larger-area coverage.

Abstract

We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation. As the most extensive and diverse collection of its kind to date, the dataset covers a total area of 10$km^2$ with close to 600 million points and features eleven distinct object categories. To guarantee the dataset's quality and utility, we have thoroughly curated the point labels through an internal team of experts, ensuring accuracy and consistency in semantic labeling. The dataset is engineered to move forward the fields of 3D urban modeling, scene understanding, and utility infrastructure management by presenting new challenges and potential applications. As a benchmark, we report qualitative and quantitative analysis of a voxel-based point cloud segmentation approach based on the Minkowski Engine.

ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation

TL;DR

ECLAIR addresses the shortage of large-scale outdoor aerial LiDAR datasets for 3D semantic segmentation by providing a high-fidelity, colorized point-cloud dataset over >

with 11 classes, ground-truth and pseudo-label annotations, and a Minkowski Engine baseline evaluation. The paper details a full data creation pipeline, including helicopter-based capture, tiling, colorization, and manual quality control, plus a thorough ablation study on features, losses, and architectures. Key findings show that combining intensity with return features and using Focal loss improves segmentation, while pseudo-labels enhance generalization; the Res16UNet14C backbone achieves strong per-class performance. This dataset enables scalable benchmarking for urban infrastructure and utility management and paves the way for future extensions to instance segmentation and larger-area coverage.

Abstract

with close to 600 million points and features eleven distinct object categories. To guarantee the dataset's quality and utility, we have thoroughly curated the point labels through an internal team of experts, ensuring accuracy and consistency in semantic labeling. The dataset is engineered to move forward the fields of 3D urban modeling, scene understanding, and utility infrastructure management by presenting new challenges and potential applications. As a benchmark, we report qualitative and quantitative analysis of a voxel-based point cloud segmentation approach based on the Minkowski Engine.

Paper Structure (16 sections, 7 figures, 2 tables)

This paper contains 16 sections, 7 figures, 2 tables.

Introduction
Related Work
Semantic Understanding of 3D Areas
3D Semantic Learning
ECLAIR: Dataset Creation
Data Capture
Data Processing
Class Specifications
Data Quality Control
Data Visualization
Experimental Evaluation
Statistics of ECLAIR
Metrics
Ablation Studies
3D Semantic Understanding
...and 1 more sections

Figures (7)

Figure 1: Overview of the proposed ECLAIR dataset. We introduce ECLAIR, a new outdoor large-scale aerial LiDAR dataset. It covers a total area of more than 10 square kilometers encompassing 11 semantic classes. The long-tail accurate annotations enable fine-grained semantic understanding. Different semantic classes are labeled by different colors.
Figure 2: Point cloud visualization. CORE viewer combines a view of point clouds colored based on classifications, a map view, and image view. The point clouds and images also show the vector data of objects in 3D and 2D, respectively.
Figure 3: Point cloud inspection. 3D Point Cloud Navigation/Editing View provided by CORE.
Figure 4: Component inventory. Detection and inventory of specific components in CORE based on multimodal data inputs.
Figure 5: The distribution of semantic classes. We report the total number of points for each semantic category showing a high imbalance of the proposed dataset (note the logarithmic scale for the horizontal axis).
...and 2 more figures

ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation

TL;DR

Abstract

ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)