Table of Contents
Fetching ...

Multi-dataset synergistic in supervised learning to pre-label structural components in point clouds from shell construction scenes

Lukas Rauch, Thomas Braml

TL;DR

This work tackles the data annotation bottleneck for 3D point-cloud semantic segmentation in shell-construction sites by evaluating a three-phase approach: (i) fully supervised baseline on a dedicated AEC validation set, (ii) cross-domain training using public indoor datasets to test generalization, and (iii) transfer learning from pre-trained indoor backbones fine-tuned on domain-specific data. Using three transformer-based architectures, the study demonstrates that substantial segmentation performance can be achieved with limited domain data, and that structured pre-training on datasets like Structured3D yields notable gains in transfer scenarios ($mIoU \approx 75\%$). However, cross-domain transfer exhibits a domain gap, with many class IoUs remaining low, though ceilings, walls, and floors consistently segment well when trained on synthetic indoor data. The findings suggest that pre-labeling with pre-trained transformers can significantly reduce manual annotation effort for new shell-construction datasets, supporting scalable construction-site BIM annotation and robotics workflows, while highlighting the need for larger, open-domain AEC-specific datasets for further improvements. $mIoU$ and IoU metrics are used throughout to quantify segmentation performance, with math delimited accordingly where applicable.

Abstract

The significant effort required to annotate data for new training datasets hinders computer vision research and machine learning in the construction industry. This work explores adapting standard datasets and the latest transformer model architectures for point cloud semantic segmentation in the context of shell construction sites. Unlike common approaches focused on object segmentation of building interiors and furniture, this study addressed the challenges of segmenting complex structural components in Architecture, Engineering, and Construction (AEC). We establish a baseline through supervised training and a custom validation dataset, evaluate the cross-domain inference with large-scale indoor datasets, and utilize transfer learning to maximize segmentation performance with minimal new data. The findings indicate that with minimal fine-tuning, pre-trained transformer architectures offer an effective strategy for building component segmentation. Our results are promising for automating the annotation of new, previously unseen data when creating larger training resources and for the segmentation of frequently recurring objects.

Multi-dataset synergistic in supervised learning to pre-label structural components in point clouds from shell construction scenes

TL;DR

This work tackles the data annotation bottleneck for 3D point-cloud semantic segmentation in shell-construction sites by evaluating a three-phase approach: (i) fully supervised baseline on a dedicated AEC validation set, (ii) cross-domain training using public indoor datasets to test generalization, and (iii) transfer learning from pre-trained indoor backbones fine-tuned on domain-specific data. Using three transformer-based architectures, the study demonstrates that substantial segmentation performance can be achieved with limited domain data, and that structured pre-training on datasets like Structured3D yields notable gains in transfer scenarios (). However, cross-domain transfer exhibits a domain gap, with many class IoUs remaining low, though ceilings, walls, and floors consistently segment well when trained on synthetic indoor data. The findings suggest that pre-labeling with pre-trained transformers can significantly reduce manual annotation effort for new shell-construction datasets, supporting scalable construction-site BIM annotation and robotics workflows, while highlighting the need for larger, open-domain AEC-specific datasets for further improvements. and IoU metrics are used throughout to quantify segmentation performance, with math delimited accordingly where applicable.

Abstract

The significant effort required to annotate data for new training datasets hinders computer vision research and machine learning in the construction industry. This work explores adapting standard datasets and the latest transformer model architectures for point cloud semantic segmentation in the context of shell construction sites. Unlike common approaches focused on object segmentation of building interiors and furniture, this study addressed the challenges of segmenting complex structural components in Architecture, Engineering, and Construction (AEC). We establish a baseline through supervised training and a custom validation dataset, evaluate the cross-domain inference with large-scale indoor datasets, and utilize transfer learning to maximize segmentation performance with minimal new data. The findings indicate that with minimal fine-tuning, pre-trained transformer architectures offer an effective strategy for building component segmentation. Our results are promising for automating the annotation of new, previously unseen data when creating larger training resources and for the segmentation of frequently recurring objects.

Paper Structure

This paper contains 18 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: One-hot encoding of the occurrence and overlap of object classes of our validation dataset and four established training datasets S3DIS, ScanNet V2, Structured3D, and VASAD. The listed datasets are used by us for training models for the semantic segmentation of component groups.
  • Figure 2: Spherical Point Cloud Renderings of three rooms from the custom validation dataset, collected at a residential apartment building site during shell construction. a) Medium-sized room before plastering work is completed. b) Staircase before plastering work is completed. c) Medium-sized room after plastering work is completed. The scenes are characterized by varying surface textures due to the nature of the construction, challenging lighting conditions, and complex floor plans. They include obstacles and wet spots on the floor, which produce reflections and scanning artifacts. Yellow pixels represent empty canvas pixels where no points were projected.
  • Figure 3: Statistical Evaluation of the mean points-per-class distribution and mean instances-per-class distribution per scene in the validation dataset, plotted on a logarithmic scale. The uncertainty bars represent the standard deviation of the statistical sample distribution between the 36 scenes.
  • Figure 4: Concept of transfer learning, in which the knowledge from the previous training is repurposed to improve the downstream tasks.
  • Figure 5: Baseline Test Results for 3D Semantic Segmentation. This figure presents the inference results from the baseline training experiment using three model architectures: Point Transformer V2, Point Transformer V3, and SWIN3D. Each model was trained and tested on our custom validation dataset, which focuses on shell construction site scenes. The columns display five representative screenshots per model, with predicted class labels uniquely colored to illustrate the segmentation performance across different architectural components.
  • ...and 3 more figures