H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

Minyoung Park; Mirae Do; YeonJae Shin; Jaeseok Yoo; Jongkwang Hong; Joongrock Kim; Chul Lee

H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

Minyoung Park, Mirae Do, YeonJae Shin, Jaeseok Yoo, Jongkwang Hong, Joongrock Kim, Chul Lee

TL;DR

Indoor 3D reconstruction struggles to jointly capture smooth room layouts and intricate object surfaces. The authors propose H2O-SDF, a two-phase approach comprising Holistic Surface Learning for global geometry and Object Surface Field (OSF) for object-specific details, augmented by normal-uncertainty based loss reweighting and OSF-guided sampling. The OSF introduces a 3D cue that aligns object surfaces with the SDF without direct SDF supervision, addressing vanishing gradient issues and enabling high-frequency detail recovery via losses $\mathcal{L}_{2d_{osf}}$, $\mathcal{L}_{3d_{osf}}$, and $\mathcal{L}_{ref}$. Extensive ablations and ScanNet evaluations show state-of-the-art geometry quality and improved object-detail fidelity, with robust normal predictions. The work advances practical indoor scene reconstruction and opens avenues for scene editing using the OSF signal as a 3D geometric prior.

Abstract

Advanced techniques using Neural Radiance Fields (NeRF), Signed Distance Fields (SDF), and Occupancy Fields have recently emerged as solutions for 3D indoor scene reconstruction. We introduce a novel two-phase learning approach, H2O-SDF, that discriminates between object and non-object regions within indoor environments. This method achieves a nuanced balance, carefully preserving the geometric integrity of room layouts while also capturing intricate surface details of specific objects. A cornerstone of our two-phase learning framework is the introduction of the Object Surface Field (OSF), a novel concept designed to mitigate the persistent vanishing gradient problem that has previously hindered the capture of high-frequency details in other methods. Our proposed approach is validated through several experiments that include ablation studies.

H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

TL;DR

, and

. Extensive ablations and ScanNet evaluations show state-of-the-art geometry quality and improved object-detail fidelity, with robust normal predictions. The work advances practical indoor scene reconstruction and opens avenues for scene editing using the OSF signal as a 3D geometric prior.

Abstract

Paper Structure (16 sections, 13 equations, 16 figures, 8 tables)

This paper contains 16 sections, 13 equations, 16 figures, 8 tables.

Introduction
Related Work
Neural Implicit Surface Representation
Neural 3D Reconstruction for Indoor Scenes
Our Method
Holistic Surface Learning
Object Surface Learning
Experiments
Experimental setting
Comparisons
Analysis
Conclusion
appendix
Additional implementation details
Additional method details
...and 1 more sections

Figures (16)

Figure 1: Comparison of Reconstruction Results
Figure 2: Architecture Overview The main pipeline consists of two phases. During the first phase (green part), we learn the global indoor scene geometry through re-weighted $\mathcal{L}_\mathbf{c}$ and $\mathcal{L}_\mathbf{n}$ based on normal uncertainty from an input position $\mathbf{x}$. During the second phase (blue part), we further train the Object Surface Field $osf(\mathbf{x})$ using $\mathcal{L}_{2d_{osf}}$ that is supervised by a 2D object mask; $\mathcal{L}_{3d_{osf}}$ that cross-guides between OSF $osf(\mathbf{x})$ and SDF $d(\mathbf{x})$; and $\mathcal{L}_{ref}$ that refines $osf$ with $\mathbf{p}$ (point cloud). During this process, we conduct OSF-guided sampling strategy (green dot).
Figure 3: Comparisons of OSF Compared to using only (a) $\mathcal{L}_{2d_{osf}}$, introduction of our (b) $\mathcal{L}_{3d_{osf}}$ enables OSF to represent precise object boundaries. Improvement includes object surfaces (Red) and non-object surfaces (Blue).
Figure 4: Interaction of OSF and SDF Illustration of (a) the initial status of OSF, (b) the influence of the gradient of $\mathcal{L}_{3d_{osf}}$ with respect to OSF, (c) the case when SDF fails to capture thin structure, (d) the influence of the gradient of $\mathcal{L}_{3d_{osf}}$ with respect to SDF, and (e) the final result of OSF and SDF. Interior refers to the region inside an object.
Figure 5: 3D Reconstruction Results on ScanNet$\text{H}_2\text{O-SDF}$ shows improved reconstruction ability for both room-layout regions (blue box) and fine-grained object regions (red box) compared to other methods. Reconstruction for the remaining scenes are visualized in the Appendix (Sec. \ref{['sup_exp']})
...and 11 more figures

H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

TL;DR

Abstract

H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

Authors

TL;DR

Abstract

Table of Contents

Figures (16)