Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction

Ruihong Yin; Sezer Karaoglu; Theo Gevers

Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction

Ruihong Yin, Sezer Karaoglu, Theo Gevers

TL;DR

This work tackles indoor scene reconstruction from RGB imagery by fully exploiting geometry beyond traditional feature-level cues. It introduces a three‑level geometry integration mechanism: geometry-guided feature learning (G2FL) to embed geometric priors into multi-view features, geometry-guided adaptive feature fusion (G2AFF) to weight views using occlusion and pose cues, and a consistent 3D normal loss (C3NL) to enforce local geometric consistency between 2D and 3D normals. Together, these components improve TSDF-based volumetric reconstruction, achieving state-of-the-art results on ScanNet and demonstrating strong generalization to 7-Scenes and TUM RGB-D. The approach yields smoother planar surfaces and more accurate geometry, and can be integrated into existing online or offline volumetric pipelines to enhance indoor scene understanding.

Abstract

In addition to color and textural information, geometry provides important cues for 3D scene reconstruction. However, current reconstruction methods only include geometry at the feature level thus not fully exploiting the geometric information. In contrast, this paper proposes a novel geometry integration mechanism for 3D scene reconstruction. Our approach incorporates 3D geometry at three levels, i.e. feature learning, feature fusion, and network supervision. First, geometry-guided feature learning encodes geometric priors to contain view-dependent information. Second, a geometry-guided adaptive feature fusion is introduced which utilizes the geometric priors as a guidance to adaptively generate weights for multiple views. Third, at the supervision level, taking the consistency between 2D and 3D normals into account, a consistent 3D normal loss is designed to add local constraints. Large-scale experiments are conducted on the ScanNet dataset, showing that volumetric methods with our geometry integration mechanism outperform state-of-the-art methods quantitatively as well as qualitatively. Volumetric methods with ours also show good generalization on the 7-Scenes and TUM RGB-D datasets.

Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction

TL;DR

Abstract

Paper Structure (19 sections, 13 equations, 9 figures, 9 tables)

This paper contains 19 sections, 13 equations, 9 figures, 9 tables.

Introduction
Related work
3D scene reconstruction
Geometric priors in 3D scene reconstruction
Multi-view feature fusion
Method
Geometry-guided feature learning
Geometry-guided adaptive feature fusion
Consistent 3D normal loss
Experiments
Datasets and metrics
Implementation details
Evaluation results
Ablation study
Conclusion
...and 4 more sections

Figures (9)

Figure 1: Pipeline of existing volumetric methods compared to our proposed geometry-guided feature learning and fusion for 3D scene reconstruction. Our approach (green parts) integrates view-dependent and local geometry into (1) feature learning, (2) multi-view feature fusion, and (3) network supervision.
Figure 2: Details of our proposed geometry-guided feature learning and geometry-guided adaptive feature fusion. (a) Geometry-guided feature learning: After 2D visual feature learning, view-dependent geometric priors (e.g. surface normal and viewing direction) are encoded and fused into the visual features of the multi-view volume using a MLP, linear layers, and Transformers. (b) Geometry-guided adaptive feature fusion: Fusion weighting is adaptively learned by a MLP with the guidance of features, relative pose distances, and occlusion priors.
Figure 3: Boundary and consistency analysis of our proposed 3D normal loss. (a) $RGB$ images. (b) 2D surface normals predicted by a pre-trained normal network bae2021estimating. (c) 2D boundary masks. White regions are planes, which are retained for normal loss computation. (d) Projected normal $\widetilde{\textbf{N}}$ is the 3D normal back-projected from the 2D normal. (e) 3D normal $\textbf{N}$ is generated from the ground truth of the TSDF, showing noise near the boundaries. (f) Cosine similarity between (d) and (e). Blue points in the red circle mean that angles between (d) and (e) are greater than $90^{\circ}$.
Figure 4: Qualitative results on ScanNet. Colors on the meshes are related to surface normals. Compared to other methods, NeuralRecon + ours is able to generate more regions, smoother planes, and more accurate geometry relationships.
Figure 5: Visualization comparison of the ablation study. Our proposed geometry-guided feature learning (G2FL), geometry-guided adaptive feature fusion (G2AFF), and consistent 3D normal loss (C3NL) all contribute to an improved reconstruction quality.
...and 4 more figures

Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction

TL;DR

Abstract

Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (9)