SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark
HyunJun Jung, Weihang Li, Shun-Cheng Wu, William Bittner, Nikolas Brasch, Jifei Song, Eduardo Pérez-Pellitero, Zhensong Zhang, Arthur Moreau, Nassir Navab, Benjamin Busam
TL;DR
Indoor 3D datasets traditionally sacrifice ground-truth accuracy for scale, hindering dense geometry evaluation. SCRREAM introduces a Scan, Register, Render, and Map pipeline that scans objects at high resolution, registers them into real rooms, renders realistic scenes, and maps real video frames to obtain precise camera poses and dense ground-truth depth. The approach yields publicly available data and benchmarks across four tasks—indoor reconstruction/SLAM, object removal, human reconstruction, and 6D pose estimation—enabling rigorous geometric evaluation with accurate depth ground truth. This framework lowers the barrier to objective, high-fidelity benchmarking of dense 3D methods and closes the gap between synthetic accuracy and real-world scenes, facilitating advances in NVS and SLAM research.
Abstract
Traditionally, 3d indoor datasets have generally prioritized scale over ground-truth accuracy in order to obtain improved generalization. However, using these datasets to evaluate dense geometry tasks, such as depth rendering, can be problematic as the meshes of the dataset are often incomplete and may produce wrong ground truth to evaluate the details. In this paper, we propose SCRREAM, a dataset annotation framework that allows annotation of fully dense meshes of objects in the scene and registers camera poses on the real image sequence, which can produce accurate ground truth for both sparse 3D as well as dense 3D tasks. We show the details of the dataset annotation pipeline and showcase four possible variants of datasets that can be obtained from our framework with example scenes, such as indoor reconstruction and SLAM, scene editing & object removal, human reconstruction and 6d pose estimation. Recent pipelines for indoor reconstruction and SLAM serve as new benchmarks. In contrast to previous indoor dataset, our design allows to evaluate dense geometry tasks on eleven sample scenes against accurately rendered ground truth depth maps.
