Multi-dataset synergistic in supervised learning to pre-label structural components in point clouds from shell construction scenes
Lukas Rauch, Thomas Braml
TL;DR
This work tackles the data annotation bottleneck for 3D point-cloud semantic segmentation in shell-construction sites by evaluating a three-phase approach: (i) fully supervised baseline on a dedicated AEC validation set, (ii) cross-domain training using public indoor datasets to test generalization, and (iii) transfer learning from pre-trained indoor backbones fine-tuned on domain-specific data. Using three transformer-based architectures, the study demonstrates that substantial segmentation performance can be achieved with limited domain data, and that structured pre-training on datasets like Structured3D yields notable gains in transfer scenarios ($mIoU \approx 75\%$). However, cross-domain transfer exhibits a domain gap, with many class IoUs remaining low, though ceilings, walls, and floors consistently segment well when trained on synthetic indoor data. The findings suggest that pre-labeling with pre-trained transformers can significantly reduce manual annotation effort for new shell-construction datasets, supporting scalable construction-site BIM annotation and robotics workflows, while highlighting the need for larger, open-domain AEC-specific datasets for further improvements. $mIoU$ and IoU metrics are used throughout to quantify segmentation performance, with math delimited accordingly where applicable.
Abstract
The significant effort required to annotate data for new training datasets hinders computer vision research and machine learning in the construction industry. This work explores adapting standard datasets and the latest transformer model architectures for point cloud semantic segmentation in the context of shell construction sites. Unlike common approaches focused on object segmentation of building interiors and furniture, this study addressed the challenges of segmenting complex structural components in Architecture, Engineering, and Construction (AEC). We establish a baseline through supervised training and a custom validation dataset, evaluate the cross-domain inference with large-scale indoor datasets, and utilize transfer learning to maximize segmentation performance with minimal new data. The findings indicate that with minimal fine-tuning, pre-trained transformer architectures offer an effective strategy for building component segmentation. Our results are promising for automating the annotation of new, previously unseen data when creating larger training resources and for the segmentation of frequently recurring objects.
