3DSES: an indoor Lidar point cloud segmentation dataset with real and pseudo-labels from a 3D model
Maxime Mérizette, Nicolas Audebert, Pierre Kervella, Jérôme Verdun
TL;DR
3DSES addresses the shortage of dense indoor TLS labeled data by providing a high-density colorized point cloud dataset with per-point semantic labels and a BIM-ready 3D CAD model, plus LiDAR intensity. The authors introduce a model-to-cloud alignment pipeline that converts CAD models into high-quality pseudo-labels, achieving strong alignment and enabling large-scale labeling with reduced manual effort. With Gold, Silver, and Bronze variants, the dataset shows that pseudo-labels can approach real-label performance on many classes, while intensity improves discrimination for several models, particularly on larger variants. This dataset supports BIM-oriented tasks, scan-to-BIM workflows, and digital-twin research, and the authors provide open data and code to foster further advancement in indoor point-cloud understanding.
Abstract
Semantic segmentation of indoor point clouds has found various applications in the creation of digital twins for robotics, navigation and building information modeling (BIM). However, most existing datasets of labeled indoor point clouds have been acquired by photogrammetry. In contrast, Terrestrial Laser Scanning (TLS) can acquire dense sub-centimeter point clouds and has become the standard for surveyors. We present 3DSES (3D Segmentation of ESGT point clouds), a new dataset of indoor dense TLS colorized point clouds covering 427 m 2 of an engineering school. 3DSES has a unique double annotation format: semantic labels annotated at the point level alongside a full 3D CAD model of the building. We introduce a model-to-cloud algorithm for automated labeling of indoor point clouds using an existing 3D CAD model. 3DSES has 3 variants of various semantic and geometrical complexities. We show that our model-to-cloud alignment can produce pseudo-labels on our point clouds with a \> 95% accuracy, allowing us to train deep models with significant time savings compared to manual labeling. First baselines on 3DSES show the difficulties encountered by existing models when segmenting objects relevant to BIM, such as light and safety utilities. We show that segmentation accuracy can be improved by leveraging pseudo-labels and Lidar intensity, an information rarely considered in current datasets. Code and data will be open sourced.
