OpenRooms: An End-to-End Open Framework for Photorealistic Indoor Scene Datasets
Zhengqin Li, Ting-Wei Yu, Shen Sang, Sarah Wang, Meng Song, Yuhan Liu, Yu-Ying Yeh, Rui Zhu, Nitesh Gundavarapu, Jia Shi, Sai Bi, Zexiang Xu, Hong-Xing Yu, Kalyan Sunkavalli, Miloš Hašan, Ravi Ramamoorthi, Manmohan Chandraker
TL;DR
OpenRooms addresses the challenge of obtaining rich ground truth for indoor scene understanding by enabling end-to-end generation of photorealistic indoor datasets grounded in real RGB-D scans. The framework attaches high-quality spatially-varying SVBRDFs and lighting to scans and renders large-scale HDR imagery with per-pixel lighting, visibility, and light-source contributions using a GPU-based physically-based renderer. It details a pipeline for layout reconstruction, material assignment, lighting annotation, and rendering, ultimately producing a dataset with over 100k HDR images, per-pixel SVBRDF, and semantic labels suitable for inverse rendering, segmentation, AR, and robotics research. The work demonstrates the utility of the dataset through inverse rendering benchmarks, semantic segmentation pretraining, AR insertions, and robotics simulation with friction ground truth, arguing that OpenRooms can accelerate progress across vision, graphics, and robotics with its open tooling. Public release of the dataset and tools is envisioned to foster community-driven expansion and real-world applicability, including sim-to-real and multi-task learning scenarios.
Abstract
We propose a novel framework for creating large-scale photorealistic datasets of indoor scenes, with ground truth geometry, material, lighting and semantics. Our goal is to make the dataset creation process widely accessible, transforming scans into photorealistic datasets with high-quality ground truth for appearance, layout, semantic labels, high quality spatially-varying BRDF and complex lighting, including direct, indirect and visibility components. This enables important applications in inverse rendering, scene understanding and robotics. We show that deep networks trained on the proposed dataset achieve competitive performance for shape, material and lighting estimation on real images, enabling photorealistic augmented reality applications, such as object insertion and material editing. We also show our semantic labels may be used for segmentation and multi-task learning. Finally, we demonstrate that our framework may also be integrated with physics engines, to create virtual robotics environments with unique ground truth such as friction coefficients and correspondence to real scenes. The dataset and all the tools to create such datasets will be made publicly available.
