Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding
Yuchen Rao, Stefan Ainetter, Sinisa Stekovic, Vincent Lepetit, Friedrich Fraundorfer
TL;DR
This paper tackles the lack of high-quality 3D annotations for indoor scenes by introducing an automatic CAD annotation pipeline that extends HOC-Search to ScanNet++ v1, producing SCANnotate++ with CAD models and 9D poses for over 5k objects in 280 scans. The authors demonstrate that training supervised models for point cloud completion and single-view CAD model retrieval/alignment on these automatic annotations yields improvements over manually annotated baselines, and that additional automatic data further boosts performance. They also show that the learned models generalize to ScanNet++ and that pretraining on automatic annotations enhances results. The work culminates in releasing SCANnotate++ and the trained models to spur further research in 3D scene understanding and annotation-efficient learning.
Abstract
High-level 3D scene understanding is essential in many applications. However, the challenges of generating accurate 3D annotations make development of deep learning models difficult. We turn to recent advancements in automatic retrieval of synthetic CAD models, and show that data generated by such methods can be used as high-quality ground truth for training supervised deep learning models. More exactly, we employ a pipeline akin to the one previously used to automatically annotate objects in ScanNet scenes with their 9D poses and CAD models. This time, we apply it to the recent ScanNet++ v1 dataset, which previously lacked such annotations. Our findings demonstrate that it is not only possible to train deep learning models on these automatically-obtained annotations but that the resulting models outperform those trained on manually annotated data. We validate this on two distinct tasks: point cloud completion and single-view CAD model retrieval and alignment. Our results underscore the potential of automatic 3D annotations to enhance model performance while significantly reducing annotation costs. To support future research in 3D scene understanding, we will release our annotations, which we call SCANnotate++, along with our trained models.
