Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement
Sheng Ye, Yubin Hu, Matthieu Lin, Yu-Hui Wen, Wang Zhao, Yong-Jin Liu, Wenping Wang
TL;DR
This work tackles high-fidelity indoor scene reconstruction from posed RGB images by addressing two core challenges: the limited expressiveness of single-MLP implicit representations for high-frequency details, and unreliable normal priors in complex regions. It introduces a hybrid geometry architecture that combines an MLP branch with tri-plane features to jointly model low-frequency layout and fine-grained details, along with a sharpening/denoising pipeline and a pixel-wise uncertainty module to robustly leverage surface normals. An uncertainty-weighted normal-prior loss, aided by RGB, normals, and DINO features, guides the reconstruction toward accurate geometries while avoiding misleading priors. Extensive experiments on ScanNet, Replica, and real-world captures demonstrate state-of-the-art geometry quality, improved fine-detail recovery, and good generalization, with efficient resource usage and robust performance under noisy poses.
Abstract
The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas. However, they struggle to capture complex surfaces with high-frequency structures due to the inadequate neural representation and the inaccurately predicted normal priors. This work aims to reconstruct high-fidelity surfaces with fine-grained details by addressing the above limitations. To improve the capacity of the implicit representation, we propose a hybrid architecture to represent low-frequency and high-frequency regions separately. To enhance the normal priors, we introduce a simple yet effective image sharpening and denoising technique, coupled with a network that estimates the pixel-wise uncertainty of the predicted surface normal vectors. Identifying such uncertainty can prevent our model from being misled by unreliable surface normal supervisions that hinder the accurate reconstruction of intricate geometries. Experiments on the benchmark datasets show that our method outperforms existing methods in terms of reconstruction quality. Furthermore, the proposed method also generalizes well to real-world indoor scenarios captured by our hand-held mobile phones. Our code is publicly available at: https://github.com/yec22/Fine-Grained-Indoor-Recon.
