ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Guangda Ji; Silvan Weder; Francis Engelmann; Marc Pollefeys; Hermann Blum

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Guangda Ji, Silvan Weder, Francis Engelmann, Marc Pollefeys, Hermann Blum

TL;DR

This paper tackles the data bottleneck in real-world 3D indoor scene understanding by introducing ARKit LabelMaker, the largest automatically labeled real-world 3D dataset with 186 semantic classes created from ARKitScenes using LabelMakerV2. It demonstrates that large-scale auto-labeled real-world data provides substantial gains for 3D semantic segmentation, improving both vanilla and transformer-based models on ScanNet and ScanNet200, and yielding notable tail-class improvements. The authors further enhance the labeling pipeline with Grounded-SAM integration and gravity alignment, while omitting the expensive NeuS lift to maintain scalability, and show that real-world data can match or exceed synthetic-data benefits, with promising transferability to downstream tasks and zero-shot settings. Collectively, the work provides evidence that scaling real-world auto-labeled 3D data can drive substantial performance gains, and offers a practical data-generation path via mobile integration for broad, scalable 3D perception research.

Abstract

Neural network performance scales with both model size and data volume, as shown in both language and image processing. This requires scaling-friendly architectures and large datasets. While transformers have been adapted for 3D vision, a `GPT-moment' remains elusive due to limited training data. We introduce ARKit LabelMaker, a large-scale real-world 3D dataset with dense semantic annotation that is more than three times larger than prior largest dataset. Specifically, we extend ARKitScenes with automatically generated dense 3D labels using an extended LabelMaker pipeline, tailored for large-scale pre-training. Training on our dataset improves accuracy across architectures, achieving state-of-the-art 3D semantic segmentation scores on ScanNet and ScanNet200, with notable gains on tail classes. Our code is available at https://labelmaker.org and our dataset at https://huggingface.co/datasets/labelmaker/arkit_labelmaker.

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

TL;DR

Abstract

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)