Room Envelopes: A Synthetic Dataset for Indoor Layout Reconstruction from Images
Sam Bahrami, Dylan Campbell
TL;DR
This paper addresses the incomplete nature of indoor scene reconstructions by occlusion, proposing Room Envelopes, a synthetic dataset that provides two per-view pointmaps: a visible-surface map and a first-layout (structural) surface map. This dual representation allows direct supervision for feed-forward monocular layout estimation, leveraging the planar and regular nature of room layouts. Through experiments based on a MoGe backbone and comparisons to MoGe and LaRI, the authors demonstrate improved reconstruction of occluded layout geometry and show strong qualitative results, including in-the-wild images. The dataset and findings offer a practical pathway to more complete indoor geometry understanding, with potential impacts on robotic navigation and augmented reality.
Abstract
Modern scene reconstruction methods are able to accurately recover 3D surfaces that are visible in one or more images. However, this leads to incomplete reconstructions, missing all occluded surfaces. While much progress has been made on reconstructing entire objects given partial observations using generative models, the structural elements of a scene, like the walls, floors and ceilings, have received less attention. We argue that these scene elements should be relatively easy to predict, since they are typically planar, repetitive and simple, and so less costly approaches may be suitable. In this work, we present a synthetic dataset -- Room Envelopes -- that facilitates progress on this task by providing a set of RGB images and two associated pointmaps for each image: one capturing the visible surface and one capturing the first surface once fittings and fixtures are removed, that is, the structural layout. As we show, this enables direct supervision for feed-forward monocular geometry estimators that predict both the first visible surface and the first layout surface. This confers an understanding of the scene's extent, as well as the shape and location of its objects.
