Online 3D Scene Reconstruction Using Neural Object Priors
Thomas Chabal, Shizhe Chen, Jean Ponce, Cordelia Schmid
TL;DR
This work tackles online reconstruction of scenes at the level of individual objects from RGB-D video by introducing an object-centric neural implicit representation driven by per-object feature grids and small MLPs. A key contribution is feature-grid interpolation, which incrementally extends object geometry as new parts appear, enabling online operation. The second major contribution is an object library of prior shapes that can be retrieved, registered, and used to initialize current-object models, with synthesized keyframes from priors to prevent forgetting past details. Experiments on Replica, ScanNet, and lab-recorded sequences show that object priors improve reconstruction accuracy and completeness, outperforming several state-of-the-art NeRF-based and TSDF baselines, while maintaining online efficiency. The approach enables more faithful, complete, and reusable object reconstructions in dynamic scenes, with practical implications for AR, robotics, and virtual reality.
Abstract
This paper addresses the problem of reconstructing a scene online at the level of objects given an RGB-D video sequence. While current object-aware neural implicit representations hold promise, they are limited in online reconstruction efficiency and shape completion. Our main contributions to alleviate the above limitations are twofold. First, we propose a feature grid interpolation mechanism to continuously update grid-based object-centric neural implicit representations as new object parts are revealed. Second, we construct an object library with previously mapped objects in advance and leverage the corresponding shape priors to initialize geometric object models in new videos, subsequently completing them with novel views as well as synthesized past views to avoid losing original object details. Extensive experiments on synthetic environments from the Replica dataset, real-world ScanNet sequences and videos captured in our laboratory demonstrate that our approach outperforms state-of-the-art neural implicit models for this task in terms of reconstruction accuracy and completeness.
