Table of Contents
Fetching ...

MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis

Xuqian Ren, Wenjia Wang, Dingding Cai, Tuuli Tuominen, Juho Kannala, Esa Rahtu

TL;DR

MuSHRoom tackles the lack of real-world benchmarks for jointly optimizing 3D reconstruction and novel view synthesis (NVS) in room-scale scenes. It introduces a multi-sensor dataset collected with Azure Kinect and iPhone, augmented by ground-truth Faro meshes, and a practical training/ testing protocol that mimics VR/AR use. The paper presents a comprehensive benchmark comparing several reconstruction and rendering pipelines, revealing that current methods struggle to jointly achieve high geometry accuracy and photorealistic NVS under realistic noise and occlusions. By providing data, pipelines, and an end-to-end evaluation framework, MuSHRoom aims to catalyze the development of robust, consumer-device-friendly approaches for integrated 3D modeling and rendering in real-world indoor environments.

Abstract

Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism. However, there exists a knowledge gap in how to apply geometric reconstruction and photorealism modeling (novel view synthesis) in a unified framework. To address this gap and promote the development of robust and immersive modeling and rendering with consumer-grade devices, we propose a real-world Multi-Sensor Hybrid Room Dataset (MuSHRoom). Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices, and can jointly learn 3D reconstruction and novel view synthesis instead of treating them as separate tasks, making them ideal for real-world applications. We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis. Our dataset and benchmark show great potential in promoting the improvements for fusing 3D reconstruction and high-quality rendering in a robust and computationally efficient end-to-end fashion. The dataset and code are available at the project website: https://xuqianren.github.io/publications/MuSHRoom/.

MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis

TL;DR

MuSHRoom tackles the lack of real-world benchmarks for jointly optimizing 3D reconstruction and novel view synthesis (NVS) in room-scale scenes. It introduces a multi-sensor dataset collected with Azure Kinect and iPhone, augmented by ground-truth Faro meshes, and a practical training/ testing protocol that mimics VR/AR use. The paper presents a comprehensive benchmark comparing several reconstruction and rendering pipelines, revealing that current methods struggle to jointly achieve high geometry accuracy and photorealistic NVS under realistic noise and occlusions. By providing data, pipelines, and an end-to-end evaluation framework, MuSHRoom aims to catalyze the development of robust, consumer-device-friendly approaches for integrated 3D modeling and rendering in real-world indoor environments.

Abstract

Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e.g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism. However, there exists a knowledge gap in how to apply geometric reconstruction and photorealism modeling (novel view synthesis) in a unified framework. To address this gap and promote the development of robust and immersive modeling and rendering with consumer-grade devices, we propose a real-world Multi-Sensor Hybrid Room Dataset (MuSHRoom). Our dataset presents exciting challenges and requires state-of-the-art methods to be cost-effective, robust to noisy data and devices, and can jointly learn 3D reconstruction and novel view synthesis instead of treating them as separate tasks, making them ideal for real-world applications. We benchmark several famous pipelines on our dataset for joint 3D mesh reconstruction and novel view synthesis. Our dataset and benchmark show great potential in promoting the improvements for fusing 3D reconstruction and high-quality rendering in a robust and computationally efficient end-to-end fashion. The dataset and code are available at the project website: https://xuqianren.github.io/publications/MuSHRoom/.
Paper Structure (15 sections, 4 figures, 2 tables)

This paper contains 15 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The proposed MuSHRoom dataset includes 10 rooms captured by consumer devices Kinect and iPhone, and each room provides ground-truth mesh models obtained by a Faro scanner. Both Kinect and iPhone capture one long and one short RGB-D sequence for simulating a typical VR/AR use case. The MuSHRoom dataset provides camera poses and point clouds for Kinect and iPhone sequences. The dash lines demonstrate the rough capture trajectories. This dataset is intended for benchmarking room-scale 3D reconstruction and novel view synthesis.
  • Figure 2: The process pipeline. We use a Faro Scanner to obtain point clouds of the room from different locations and stitch them to create a complete model of the room, compensating for occluded areas. We use spectacular AI SDK to extract the undistorted RGB-D and camera pose for Kinect sequences and use the z-buffer to project point clouds into pixel coordinates to in-paint the raw depth. iPhone sequences are processed and registered by Polycam pose. Long/short captures of each consumer device are registered with global registration and further refined by COLMAP Schonberger_Frahm_2016 bundle adjustment.
  • Figure 3: The challenges observed in the MuSHRoom dataset.
  • Figure 4: The qualitative comparison of different methods on MuSHRoom dataset. We visualize mesh and novel view synthesis quality.