Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction
Ruihong Yin, Sezer Karaoglu, Theo Gevers
TL;DR
This work tackles indoor scene reconstruction from RGB imagery by fully exploiting geometry beyond traditional feature-level cues. It introduces a three‑level geometry integration mechanism: geometry-guided feature learning (G2FL) to embed geometric priors into multi-view features, geometry-guided adaptive feature fusion (G2AFF) to weight views using occlusion and pose cues, and a consistent 3D normal loss (C3NL) to enforce local geometric consistency between 2D and 3D normals. Together, these components improve TSDF-based volumetric reconstruction, achieving state-of-the-art results on ScanNet and demonstrating strong generalization to 7-Scenes and TUM RGB-D. The approach yields smoother planar surfaces and more accurate geometry, and can be integrated into existing online or offline volumetric pipelines to enhance indoor scene understanding.
Abstract
In addition to color and textural information, geometry provides important cues for 3D scene reconstruction. However, current reconstruction methods only include geometry at the feature level thus not fully exploiting the geometric information. In contrast, this paper proposes a novel geometry integration mechanism for 3D scene reconstruction. Our approach incorporates 3D geometry at three levels, i.e. feature learning, feature fusion, and network supervision. First, geometry-guided feature learning encodes geometric priors to contain view-dependent information. Second, a geometry-guided adaptive feature fusion is introduced which utilizes the geometric priors as a guidance to adaptively generate weights for multiple views. Third, at the supervision level, taking the consistency between 2D and 3D normals into account, a consistent 3D normal loss is designed to add local constraints. Large-scale experiments are conducted on the ScanNet dataset, showing that volumetric methods with our geometry integration mechanism outperform state-of-the-art methods quantitatively as well as qualitatively. Volumetric methods with ours also show good generalization on the 7-Scenes and TUM RGB-D datasets.
