Table of Contents
Fetching ...

InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset

Wenbin Li, Sajad Saeedi, John McCormac, Ronald Clark, Dimos Tzoumanikas, Qing Ye, Yuzhong Huang, Rui Tang, Stefan Leutenegger

TL;DR

<3-5 sentence high-level summary> InteriorNet addresses the need for scalable, photo-realistic indoor datasets for SLAM and scene understanding by combining a mega-scale library of production-quality furniture and layouts with a fast, physically-based rendering pipeline. The authors introduce ExaRenderer and ViSim to generate 20M RGB images and rich ground-truth (semantic labels, depth, 3D bounding boxes, motion data, and event-sensor outputs) across dynamic lighting and rearranged scenes. Key contributions include ~1.04M CAD models, 22M interior layouts, physics-based scene changes, learned trajectory styling, and a multi-sensor simulation framework, along with SLAM benchmarking results. The dataset enables training and evaluation at unprecedented scale and realism, with practical benefits for robotics, augmented reality, and interior perception tasks.

Abstract

Datasets have gained an enormous amount of popularity in the computer vision community, from training and evaluation of Deep Learning-based methods to benchmarking Simultaneous Localization and Mapping (SLAM). Without a doubt, synthetic imagery bears a vast potential due to scalability in terms of amounts of data obtainable without tedious manual ground truth annotations or measurements. Here, we present a dataset with the aim of providing a higher degree of photo-realism, larger scale, more variability as well as serving a wider range of purposes compared to existing datasets. Our dataset leverages the availability of millions of professional interior designs and millions of production-level furniture and object assets -- all coming with fine geometric details and high-resolution texture. We render high-resolution and high frame-rate video sequences following realistic trajectories while supporting various camera types as well as providing inertial measurements. Together with the release of the dataset, we will make executable program of our interactive simulator software as well as our renderer available at https://interiornetdataset.github.io. To showcase the usability and uniqueness of our dataset, we show benchmarking results of both sparse and dense SLAM algorithms.

InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset

TL;DR

<3-5 sentence high-level summary> InteriorNet addresses the need for scalable, photo-realistic indoor datasets for SLAM and scene understanding by combining a mega-scale library of production-quality furniture and layouts with a fast, physically-based rendering pipeline. The authors introduce ExaRenderer and ViSim to generate 20M RGB images and rich ground-truth (semantic labels, depth, 3D bounding boxes, motion data, and event-sensor outputs) across dynamic lighting and rearranged scenes. Key contributions include ~1.04M CAD models, 22M interior layouts, physics-based scene changes, learned trajectory styling, and a multi-sensor simulation framework, along with SLAM benchmarking results. The dataset enables training and evaluation at unprecedented scale and realism, with practical benefits for robotics, augmented reality, and interior perception tasks.

Abstract

Datasets have gained an enormous amount of popularity in the computer vision community, from training and evaluation of Deep Learning-based methods to benchmarking Simultaneous Localization and Mapping (SLAM). Without a doubt, synthetic imagery bears a vast potential due to scalability in terms of amounts of data obtainable without tedious manual ground truth annotations or measurements. Here, we present a dataset with the aim of providing a higher degree of photo-realism, larger scale, more variability as well as serving a wider range of purposes compared to existing datasets. Our dataset leverages the availability of millions of professional interior designs and millions of production-level furniture and object assets -- all coming with fine geometric details and high-resolution texture. We render high-resolution and high frame-rate video sequences following realistic trajectories while supporting various camera types as well as providing inertial measurements. Together with the release of the dataset, we will make executable program of our interactive simulator software as well as our renderer available at https://interiornetdataset.github.io. To showcase the usability and uniqueness of our dataset, we show benchmarking results of both sparse and dense SLAM algorithms.

Paper Structure

This paper contains 20 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Our rendering vs. real decoration guided by our furniture models and layouts.
  • Figure 2: System Overview: an end-to-end pipeline to render an RGB-D-inertial benchmark for large scale interior scene understanding and mapping. (A) We collect around 1 million CAD furniture models from world-leading furniture manufacturers. These models have been used in the real-world production. (B) Based on those models, around 1,100 professional designers/companies create around 22 million interior layouts. Most of such layouts have been used in real-world decorations. (C) For each layout, we generate a number of configurations to represent different lightings and simulate scene change over time in daily life. (D) We provide an interactive simulator (ViSim) to create the ground truth monocular/stereo trajectories, as well as IMU and event camera data. Trajectories can be set manually, or using random walk and neural network based generation. (E) All supported image sequences and ground truth data.
  • Figure 3: Statistics on object models, rooms and layouts. (Top): occurrences of the 50 most common categories of objects from the layouts (blue), and number of objects in our database (red). (Bottom Left): distribution of number of rooms per layout. (Bottom Middle): distribution of number of object per layout. (Bottom Right): Distribution of room type.
  • Figure 4: Our rendered images and the associated NYU40 labels.
  • Figure 5: Lighting setup: natural and random color collection, brightness and temperature.
  • ...and 1 more figures