Table of Contents
Fetching ...

OpenRooms: An End-to-End Open Framework for Photorealistic Indoor Scene Datasets

Zhengqin Li, Ting-Wei Yu, Shen Sang, Sarah Wang, Meng Song, Yuhan Liu, Yu-Ying Yeh, Rui Zhu, Nitesh Gundavarapu, Jia Shi, Sai Bi, Zexiang Xu, Hong-Xing Yu, Kalyan Sunkavalli, Miloš Hašan, Ravi Ramamoorthi, Manmohan Chandraker

TL;DR

OpenRooms addresses the challenge of obtaining rich ground truth for indoor scene understanding by enabling end-to-end generation of photorealistic indoor datasets grounded in real RGB-D scans. The framework attaches high-quality spatially-varying SVBRDFs and lighting to scans and renders large-scale HDR imagery with per-pixel lighting, visibility, and light-source contributions using a GPU-based physically-based renderer. It details a pipeline for layout reconstruction, material assignment, lighting annotation, and rendering, ultimately producing a dataset with over 100k HDR images, per-pixel SVBRDF, and semantic labels suitable for inverse rendering, segmentation, AR, and robotics research. The work demonstrates the utility of the dataset through inverse rendering benchmarks, semantic segmentation pretraining, AR insertions, and robotics simulation with friction ground truth, arguing that OpenRooms can accelerate progress across vision, graphics, and robotics with its open tooling. Public release of the dataset and tools is envisioned to foster community-driven expansion and real-world applicability, including sim-to-real and multi-task learning scenarios.

Abstract

We propose a novel framework for creating large-scale photorealistic datasets of indoor scenes, with ground truth geometry, material, lighting and semantics. Our goal is to make the dataset creation process widely accessible, transforming scans into photorealistic datasets with high-quality ground truth for appearance, layout, semantic labels, high quality spatially-varying BRDF and complex lighting, including direct, indirect and visibility components. This enables important applications in inverse rendering, scene understanding and robotics. We show that deep networks trained on the proposed dataset achieve competitive performance for shape, material and lighting estimation on real images, enabling photorealistic augmented reality applications, such as object insertion and material editing. We also show our semantic labels may be used for segmentation and multi-task learning. Finally, we demonstrate that our framework may also be integrated with physics engines, to create virtual robotics environments with unique ground truth such as friction coefficients and correspondence to real scenes. The dataset and all the tools to create such datasets will be made publicly available.

OpenRooms: An End-to-End Open Framework for Photorealistic Indoor Scene Datasets

TL;DR

OpenRooms addresses the challenge of obtaining rich ground truth for indoor scene understanding by enabling end-to-end generation of photorealistic indoor datasets grounded in real RGB-D scans. The framework attaches high-quality spatially-varying SVBRDFs and lighting to scans and renders large-scale HDR imagery with per-pixel lighting, visibility, and light-source contributions using a GPU-based physically-based renderer. It details a pipeline for layout reconstruction, material assignment, lighting annotation, and rendering, ultimately producing a dataset with over 100k HDR images, per-pixel SVBRDF, and semantic labels suitable for inverse rendering, segmentation, AR, and robotics research. The work demonstrates the utility of the dataset through inverse rendering benchmarks, semantic segmentation pretraining, AR insertions, and robotics simulation with friction ground truth, arguing that OpenRooms can accelerate progress across vision, graphics, and robotics with its open tooling. Public release of the dataset and tools is envisioned to foster community-driven expansion and real-world applicability, including sim-to-real and multi-task learning scenarios.

Abstract

We propose a novel framework for creating large-scale photorealistic datasets of indoor scenes, with ground truth geometry, material, lighting and semantics. Our goal is to make the dataset creation process widely accessible, transforming scans into photorealistic datasets with high-quality ground truth for appearance, layout, semantic labels, high quality spatially-varying BRDF and complex lighting, including direct, indirect and visibility components. This enables important applications in inverse rendering, scene understanding and robotics. We show that deep networks trained on the proposed dataset achieve competitive performance for shape, material and lighting estimation on real images, enabling photorealistic augmented reality applications, such as object insertion and material editing. We also show our semantic labels may be used for segmentation and multi-task learning. Finally, we demonstrate that our framework may also be integrated with physics engines, to create virtual robotics environments with unique ground truth such as friction coefficients and correspondence to real scenes. The dataset and all the tools to create such datasets will be made publicly available.

Paper Structure

This paper contains 72 sections, 3 equations, 37 figures, 15 tables.

Figures (37)

  • Figure 1: Our framework for creating a synthetic dataset of complex indoor scenes with ground truth shape, SVBRDF and SV-lighting, along with the resulting applications. Given possibly noisy scans acquired with a commodity 3D sensor, we generate consistent layouts for room and furniture. We ascribe per-pixel ground truth for material in the form of high-quality SVBRDF and for lighting as spatially-varying physically-based representations. We render a large-scale dataset of images associated with this ground truth, which can be used to train deep networks for inverse rendering and semantic segmentation. We further motivate applications for augmented reality and robotics, while suggesting that the open source tools we make available can be used by the community to create other large-scale datasets too.
  • Figure 2: Images from ScanNet and our corresponding synthetic scene layouts rendered with different materials, different lighting, and different views selected by our algorithm. A video is included in the supplementary. The third row shows the same scene as the second one, but rendered with freely available Substance Share materials substance instead of the public but non-free Adobe Stock materials adobestock.
  • Figure 3: UIs for annotating room layout (Left top) and material category (Right top). (Bottom) Material examples from each category. Please zoom in for better visualization.
  • Figure 4: One of our rendered images with ground-truth geometry, spatially-varying material and segmentation labels.
  • Figure 5: Our ground-truth light source annotations. From left to right: input and for each light source, its instance segmentation, and direct shading with and without occlusion. Our annotations reveal rich information about light transport in indoor scenes.
  • ...and 32 more figures