Table of Contents
Fetching ...

H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

Minyoung Park, Mirae Do, YeonJae Shin, Jaeseok Yoo, Jongkwang Hong, Joongrock Kim, Chul Lee

TL;DR

Indoor 3D reconstruction struggles to jointly capture smooth room layouts and intricate object surfaces. The authors propose H2O-SDF, a two-phase approach comprising Holistic Surface Learning for global geometry and Object Surface Field (OSF) for object-specific details, augmented by normal-uncertainty based loss reweighting and OSF-guided sampling. The OSF introduces a 3D cue that aligns object surfaces with the SDF without direct SDF supervision, addressing vanishing gradient issues and enabling high-frequency detail recovery via losses $\mathcal{L}_{2d_{osf}}$, $\mathcal{L}_{3d_{osf}}$, and $\mathcal{L}_{ref}$. Extensive ablations and ScanNet evaluations show state-of-the-art geometry quality and improved object-detail fidelity, with robust normal predictions. The work advances practical indoor scene reconstruction and opens avenues for scene editing using the OSF signal as a 3D geometric prior.

Abstract

Advanced techniques using Neural Radiance Fields (NeRF), Signed Distance Fields (SDF), and Occupancy Fields have recently emerged as solutions for 3D indoor scene reconstruction. We introduce a novel two-phase learning approach, H2O-SDF, that discriminates between object and non-object regions within indoor environments. This method achieves a nuanced balance, carefully preserving the geometric integrity of room layouts while also capturing intricate surface details of specific objects. A cornerstone of our two-phase learning framework is the introduction of the Object Surface Field (OSF), a novel concept designed to mitigate the persistent vanishing gradient problem that has previously hindered the capture of high-frequency details in other methods. Our proposed approach is validated through several experiments that include ablation studies.

H2O-SDF: Two-phase Learning for 3D Indoor Reconstruction using Object Surface Fields

TL;DR

Indoor 3D reconstruction struggles to jointly capture smooth room layouts and intricate object surfaces. The authors propose H2O-SDF, a two-phase approach comprising Holistic Surface Learning for global geometry and Object Surface Field (OSF) for object-specific details, augmented by normal-uncertainty based loss reweighting and OSF-guided sampling. The OSF introduces a 3D cue that aligns object surfaces with the SDF without direct SDF supervision, addressing vanishing gradient issues and enabling high-frequency detail recovery via losses , , and . Extensive ablations and ScanNet evaluations show state-of-the-art geometry quality and improved object-detail fidelity, with robust normal predictions. The work advances practical indoor scene reconstruction and opens avenues for scene editing using the OSF signal as a 3D geometric prior.

Abstract

Advanced techniques using Neural Radiance Fields (NeRF), Signed Distance Fields (SDF), and Occupancy Fields have recently emerged as solutions for 3D indoor scene reconstruction. We introduce a novel two-phase learning approach, H2O-SDF, that discriminates between object and non-object regions within indoor environments. This method achieves a nuanced balance, carefully preserving the geometric integrity of room layouts while also capturing intricate surface details of specific objects. A cornerstone of our two-phase learning framework is the introduction of the Object Surface Field (OSF), a novel concept designed to mitigate the persistent vanishing gradient problem that has previously hindered the capture of high-frequency details in other methods. Our proposed approach is validated through several experiments that include ablation studies.
Paper Structure (16 sections, 13 equations, 16 figures, 8 tables)

This paper contains 16 sections, 13 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: Comparison of Reconstruction Results
  • Figure 2: Architecture Overview The main pipeline consists of two phases. During the first phase (green part), we learn the global indoor scene geometry through re-weighted $\mathcal{L}_\mathbf{c}$ and $\mathcal{L}_\mathbf{n}$ based on normal uncertainty from an input position $\mathbf{x}$. During the second phase (blue part), we further train the Object Surface Field $osf(\mathbf{x})$ using $\mathcal{L}_{2d_{osf}}$ that is supervised by a 2D object mask; $\mathcal{L}_{3d_{osf}}$ that cross-guides between OSF $osf(\mathbf{x})$ and SDF $d(\mathbf{x})$; and $\mathcal{L}_{ref}$ that refines $osf$ with $\mathbf{p}$ (point cloud). During this process, we conduct OSF-guided sampling strategy (green dot).
  • Figure 3: Comparisons of OSF Compared to using only (a) $\mathcal{L}_{2d_{osf}}$, introduction of our (b) $\mathcal{L}_{3d_{osf}}$ enables OSF to represent precise object boundaries. Improvement includes object surfaces (Red) and non-object surfaces (Blue).
  • Figure 4: Interaction of OSF and SDF Illustration of (a) the initial status of OSF, (b) the influence of the gradient of $\mathcal{L}_{3d_{osf}}$ with respect to OSF, (c) the case when SDF fails to capture thin structure, (d) the influence of the gradient of $\mathcal{L}_{3d_{osf}}$ with respect to SDF, and (e) the final result of OSF and SDF. Interior refers to the region inside an object.
  • Figure 5: 3D Reconstruction Results on ScanNet$\text{H}_2\text{O-SDF}$ shows improved reconstruction ability for both room-layout regions (blue box) and fine-grained object regions (red box) compared to other methods. Reconstruction for the remaining scenes are visualized in the Appendix (Sec. \ref{['sup_exp']})
  • ...and 11 more figures