Table of Contents
Fetching ...

Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

Yulun Wu, Han Huang, Wenyuan Zhang, Chao Deng, Ge Gao, Ming Gu, Yu-Shen Liu

TL;DR

Sparsely sampled indoor scenes challenge traditional geometry reconstruction, especially for monocular priors. Sparis advances this field by integrating a VolSDF-based neural implicit surface with inter-image depth priors derived from 2D feature matching, augmented by cross-view reprojection and matching-optimization strategies (angular filter and epipolar weighting) to enforce geometric consistency. Empirical results on ScanNet and Replica show substantial improvements in F-score and depth accuracy under sparse viewing, producing more complete and smoother surfaces than prior methods. The approach reduces reliance on dense inputs and enhances robustness to matching noise, offering practical improvements for real-world indoor reconstruction tasks.

Abstract

In recent years, reconstructing indoor scene geometry from multi-view images has achieved encouraging accomplishments. Current methods incorporate monocular priors into neural implicit surface models to achieve high-quality reconstructions. However, these methods require hundreds of images for scene reconstruction. When only a limited number of views are available as input, the performance of monocular priors deteriorates due to scale ambiguity, leading to the collapse of the reconstructed scene geometry. In this paper, we propose a new method, named Sparis, for indoor surface reconstruction from sparse views. Specifically, we investigate the impact of monocular priors on sparse scene reconstruction, introducing a novel prior based on inter-image matching information. Our prior offers more accurate depth information while ensuring cross-view matching consistency. Additionally, we employ an angular filter strategy and an epipolar matching weight function, aiming to reduce errors due to view matching inaccuracies, thereby refining the inter-image prior for improved reconstruction accuracy. The experiments conducted on widely used benchmarks demonstrate superior performance in sparse-view scene reconstruction.

Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

TL;DR

Sparsely sampled indoor scenes challenge traditional geometry reconstruction, especially for monocular priors. Sparis advances this field by integrating a VolSDF-based neural implicit surface with inter-image depth priors derived from 2D feature matching, augmented by cross-view reprojection and matching-optimization strategies (angular filter and epipolar weighting) to enforce geometric consistency. Empirical results on ScanNet and Replica show substantial improvements in F-score and depth accuracy under sparse viewing, producing more complete and smoother surfaces than prior methods. The approach reduces reliance on dense inputs and enhances robustness to matching noise, offering practical improvements for real-world indoor reconstruction tasks.

Abstract

In recent years, reconstructing indoor scene geometry from multi-view images has achieved encouraging accomplishments. Current methods incorporate monocular priors into neural implicit surface models to achieve high-quality reconstructions. However, these methods require hundreds of images for scene reconstruction. When only a limited number of views are available as input, the performance of monocular priors deteriorates due to scale ambiguity, leading to the collapse of the reconstructed scene geometry. In this paper, we propose a new method, named Sparis, for indoor surface reconstruction from sparse views. Specifically, we investigate the impact of monocular priors on sparse scene reconstruction, introducing a novel prior based on inter-image matching information. Our prior offers more accurate depth information while ensuring cross-view matching consistency. Additionally, we employ an angular filter strategy and an epipolar matching weight function, aiming to reduce errors due to view matching inaccuracies, thereby refining the inter-image prior for improved reconstruction accuracy. The experiments conducted on widely used benchmarks demonstrate superior performance in sparse-view scene reconstruction.
Paper Structure (23 sections, 19 equations, 6 figures, 4 tables)

This paper contains 23 sections, 19 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Surface reconstruction results from sparse views of an indoor scene. Our method Sparis outperforms in addressing challenges such as missing reconstruction details (NeuRIS), uneven surface (DäRF), and spatial noise (VolRecon).
  • Figure 2: The overview of Sparis. Given sparse indoor images, the reconstruction of 3D surfaces is achieved via a 2-stage process: (1) Pre-processing: estimated normal maps and matching pixel pairs are derived respectively using a pre-trained normal prediction network $f_\theta$ and a feature matching network $f_\phi$; (2) Training with priors: the neural rendering procedure is optimized with inter-image depth priors, cross-view reprojection and monocular normal priors, generating complete and detailed geometry.
  • Figure 3: Illustration of matching priors. (a) Using matching pixel pairs, we obtain the triangulated depth $\widetilde{D}$ and reprojected coordinates $\bm{p}^\prime$ from the rendered 3D surface points; (b) Mismatches cause depth estimation errors, especially under minimal translation and angular variations.
  • Figure 4: Visual comparisons of 3D reconstruction results on ScanNet with sparse views. The overall top views and the zoom-in views of the marked areas show that our approach produces more complete and fine-grained geometry.
  • Figure 5: Visual comparisons of 3D reconstruction results on Replica with sparse views.
  • ...and 1 more figures