Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement

Sheng Ye; Yubin Hu; Matthieu Lin; Yu-Hui Wen; Wang Zhao; Yong-Jin Liu; Wenping Wang

Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement

Sheng Ye, Yubin Hu, Matthieu Lin, Yu-Hui Wen, Wang Zhao, Yong-Jin Liu, Wenping Wang

TL;DR

This work tackles high-fidelity indoor scene reconstruction from posed RGB images by addressing two core challenges: the limited expressiveness of single-MLP implicit representations for high-frequency details, and unreliable normal priors in complex regions. It introduces a hybrid geometry architecture that combines an MLP branch with tri-plane features to jointly model low-frequency layout and fine-grained details, along with a sharpening/denoising pipeline and a pixel-wise uncertainty module to robustly leverage surface normals. An uncertainty-weighted normal-prior loss, aided by RGB, normals, and DINO features, guides the reconstruction toward accurate geometries while avoiding misleading priors. Extensive experiments on ScanNet, Replica, and real-world captures demonstrate state-of-the-art geometry quality, improved fine-detail recovery, and good generalization, with efficient resource usage and robust performance under noisy poses.

Abstract

The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas. However, they struggle to capture complex surfaces with high-frequency structures due to the inadequate neural representation and the inaccurately predicted normal priors. This work aims to reconstruct high-fidelity surfaces with fine-grained details by addressing the above limitations. To improve the capacity of the implicit representation, we propose a hybrid architecture to represent low-frequency and high-frequency regions separately. To enhance the normal priors, we introduce a simple yet effective image sharpening and denoising technique, coupled with a network that estimates the pixel-wise uncertainty of the predicted surface normal vectors. Identifying such uncertainty can prevent our model from being misled by unreliable surface normal supervisions that hinder the accurate reconstruction of intricate geometries. Experiments on the benchmark datasets show that our method outperforms existing methods in terms of reconstruction quality. Furthermore, the proposed method also generalizes well to real-world indoor scenarios captured by our hand-held mobile phones. Our code is publicly available at: https://github.com/yec22/Fine-Grained-Indoor-Recon.

Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement

TL;DR

Abstract

Paper Structure (30 sections, 9 equations, 15 figures, 6 tables)

This paper contains 30 sections, 9 equations, 15 figures, 6 tables.

Introduction
Related Works
Neural Surface Representation
Indoor Scene Reconstruction
Uncertainty Estimation
Method
Preliminary
Hybrid Representation for Geometry
Prior Enhancement
Image Sharpening and Denoising Techniques
Uncertainty Estimation Module
Loss Functions
Experiments
Experiment Details
Datasets
...and 15 more sections

Figures (15)

Figure 1: Our proposed method can reconstruct fine and accurate indoor scenes only from a sequence of posed RGB images.
Figure 2: The overall pipeline of our proposed approach. We tackle high-frequency regions in scene reconstruction from the perspective of both the representation and normal priors. In particular, we propose a hybrid geometry representation to enhance the expressive power, and an image preprocessing technique along with pixel-wise uncertainty to enhance the normal priors.
Figure 3: Comparison of the reconstructed meshes generated by different representations. Zoom in for better visualization.
Figure 4: We propose an uncertainty estimation module to predict the pixel-wise uncertainty maps of the normal priors.
Figure 5: Qualitative comparison of reconstructed meshes with other baselines on ScanNet dataset.
...and 10 more figures

Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement

TL;DR

Abstract

Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement

Authors

TL;DR

Abstract

Table of Contents

Figures (15)