Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion
Su Sun, Cheng Zhao, Yuliang Guo, Ruoyu Wang, Xinyu Huang, Yingjie Victor Chen, Liu Ren
TL;DR
The paper tackles indoor 3D scene reconstruction from depth sequences with occluded-surface completion by introducing a coarse-fine, hierarchical octree feature volume and a dual-decoder architecture. A scene-specific Geo-decoder handles visible geometry online, while a cross-scene trained 3D Inpainter generalizes to occluded regions, enabling complete scene meshes. The method is trained offline for occlusion priors and online for visible surfaces, then uses a three-stage pipeline to produce complete meshes, validated on 3D-CRS and iTHOR with substantial gains in surface completeness and occlusion reconstruction. This approach advances editable, room-scale scene representations for AR/VR and embodied AI by delivering more complete and manipulable indoor 3D models, with plans to release the 3D-CRS dataset for future research.
Abstract
In this paper, we present a novel indoor 3D reconstruction method with occluded surface completion, given a sequence of depth readings. Prior state-of-the-art (SOTA) methods only focus on the reconstruction of the visible areas in a scene, neglecting the invisible areas due to the occlusions, e.g., the contact surface between furniture, occluded wall and floor. Our method tackles the task of completing the occluded scene surfaces, resulting in a complete 3D scene mesh. The core idea of our method is learning 3D geometry prior from various complete scenes to infer the occluded geometry of an unseen scene from solely depth measurements. We design a coarse-fine hierarchical octree representation coupled with a dual-decoder architecture, i.e., Geo-decoder and 3D Inpainter, which jointly reconstructs the complete 3D scene geometry. The Geo-decoder with detailed representation at fine levels is optimized online for each scene to reconstruct visible surfaces. The 3D Inpainter with abstract representation at coarse levels is trained offline using various scenes to complete occluded surfaces. As a result, while the Geo-decoder is specialized for an individual scene, the 3D Inpainter can be generally applied across different scenes. We evaluate the proposed method on the 3D Completed Room Scene (3D-CRS) and iTHOR datasets, significantly outperforming the SOTA methods by a gain of 16.8% and 24.2% in terms of the completeness of 3D reconstruction. 3D-CRS dataset including a complete 3D mesh of each scene is provided at project webpage.
