Table of Contents
Fetching ...

Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion

Su Sun, Cheng Zhao, Yuliang Guo, Ruoyu Wang, Xinyu Huang, Yingjie Victor Chen, Liu Ren

TL;DR

The paper tackles indoor 3D scene reconstruction from depth sequences with occluded-surface completion by introducing a coarse-fine, hierarchical octree feature volume and a dual-decoder architecture. A scene-specific Geo-decoder handles visible geometry online, while a cross-scene trained 3D Inpainter generalizes to occluded regions, enabling complete scene meshes. The method is trained offline for occlusion priors and online for visible surfaces, then uses a three-stage pipeline to produce complete meshes, validated on 3D-CRS and iTHOR with substantial gains in surface completeness and occlusion reconstruction. This approach advances editable, room-scale scene representations for AR/VR and embodied AI by delivering more complete and manipulable indoor 3D models, with plans to release the 3D-CRS dataset for future research.

Abstract

In this paper, we present a novel indoor 3D reconstruction method with occluded surface completion, given a sequence of depth readings. Prior state-of-the-art (SOTA) methods only focus on the reconstruction of the visible areas in a scene, neglecting the invisible areas due to the occlusions, e.g., the contact surface between furniture, occluded wall and floor. Our method tackles the task of completing the occluded scene surfaces, resulting in a complete 3D scene mesh. The core idea of our method is learning 3D geometry prior from various complete scenes to infer the occluded geometry of an unseen scene from solely depth measurements. We design a coarse-fine hierarchical octree representation coupled with a dual-decoder architecture, i.e., Geo-decoder and 3D Inpainter, which jointly reconstructs the complete 3D scene geometry. The Geo-decoder with detailed representation at fine levels is optimized online for each scene to reconstruct visible surfaces. The 3D Inpainter with abstract representation at coarse levels is trained offline using various scenes to complete occluded surfaces. As a result, while the Geo-decoder is specialized for an individual scene, the 3D Inpainter can be generally applied across different scenes. We evaluate the proposed method on the 3D Completed Room Scene (3D-CRS) and iTHOR datasets, significantly outperforming the SOTA methods by a gain of 16.8% and 24.2% in terms of the completeness of 3D reconstruction. 3D-CRS dataset including a complete 3D mesh of each scene is provided at project webpage.

Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion

TL;DR

The paper tackles indoor 3D scene reconstruction from depth sequences with occluded-surface completion by introducing a coarse-fine, hierarchical octree feature volume and a dual-decoder architecture. A scene-specific Geo-decoder handles visible geometry online, while a cross-scene trained 3D Inpainter generalizes to occluded regions, enabling complete scene meshes. The method is trained offline for occlusion priors and online for visible surfaces, then uses a three-stage pipeline to produce complete meshes, validated on 3D-CRS and iTHOR with substantial gains in surface completeness and occlusion reconstruction. This approach advances editable, room-scale scene representations for AR/VR and embodied AI by delivering more complete and manipulable indoor 3D models, with plans to release the 3D-CRS dataset for future research.

Abstract

In this paper, we present a novel indoor 3D reconstruction method with occluded surface completion, given a sequence of depth readings. Prior state-of-the-art (SOTA) methods only focus on the reconstruction of the visible areas in a scene, neglecting the invisible areas due to the occlusions, e.g., the contact surface between furniture, occluded wall and floor. Our method tackles the task of completing the occluded scene surfaces, resulting in a complete 3D scene mesh. The core idea of our method is learning 3D geometry prior from various complete scenes to infer the occluded geometry of an unseen scene from solely depth measurements. We design a coarse-fine hierarchical octree representation coupled with a dual-decoder architecture, i.e., Geo-decoder and 3D Inpainter, which jointly reconstructs the complete 3D scene geometry. The Geo-decoder with detailed representation at fine levels is optimized online for each scene to reconstruct visible surfaces. The 3D Inpainter with abstract representation at coarse levels is trained offline using various scenes to complete occluded surfaces. As a result, while the Geo-decoder is specialized for an individual scene, the 3D Inpainter can be generally applied across different scenes. We evaluate the proposed method on the 3D Completed Room Scene (3D-CRS) and iTHOR datasets, significantly outperforming the SOTA methods by a gain of 16.8% and 24.2% in terms of the completeness of 3D reconstruction. 3D-CRS dataset including a complete 3D mesh of each scene is provided at project webpage.
Paper Structure (15 sections, 10 equations, 8 figures, 4 tables)

This paper contains 15 sections, 10 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Occluded Surface Completion: Our approach marks a novel 3D surface reconstruction, uniquely completing occluded surfaces in areas invisible to existing methods. It enables more accurate 3D modelling of furniture and the reconstruction of furniture-obscured room regions, significantly advancing indoor scene reconstruction.
  • Figure 2: Coarse-fine hierarchical octree feature volume
  • Figure 3: 1: 3D Inpainter training on Scenes 0,...,N-1. The complete 3D scene meshes are provided as ground truth. 2: Joint Geo-decoder and octree feature volume optimization on testing Scene N. Only the visible depth images are provided as supervision, and the complete 3D scene mesh is not available. The parameters of both the Geo-decoder and octree feature volume are updated, while 3D Inpainter parameters are frozen. 3: 3D complete surface generation on testing Scene N.
  • Figure 4: Visual comparison of room layout after furniture removal on the 3D-CRS dataset
  • Figure 5: Visualization on ScanNet dataset: row1-Ours, row2-GT.
  • ...and 3 more figures