Unsupervised Discovery of Object-Centric Neural Fields
Rundong Luo, Hong-Xing Yu, Jiajun Wu
TL;DR
The paper tackles unsupervised discovery of 3D, object-centric scene representations from a single image, addressing the limitation that prior methods encode objects in the viewer's coordinates and thus struggle to generalize. It introduces Unsupervised discovery of Object-Centric neural Fields (uOCF), which disentangles object intrinsics from extrinsics and renders with object-centric NeRFs, enabling translation-invariant representations and single-image inference from sparse multi-view data. A two-stage training regime learns 3D object priors from simple synthetic scenes and transfers them to more complex real scenes, aided by a suite of losses and an object-centric sampling strategy; the model supports zero-shot generalization with test-time optimization. Empirically, uOCF outperforms state-of-the-art baselines on multiple tasks, demonstrates strong generalization to unseen configurations and objects, and enables 3D object segmentation and scene manipulation in real-world kitchen-like environments, with datasets and code to be released.
Abstract
We study inferring 3D object-centric scene representations from a single image. While recent methods have shown potential in unsupervised 3D object discovery from simple synthetic images, they fail to generalize to real-world scenes with visually rich and diverse objects. This limitation stems from their object representations, which entangle objects' intrinsic attributes like shape and appearance with extrinsic, viewer-centric properties such as their 3D location. To address this bottleneck, we propose Unsupervised discovery of Object-Centric neural Fields (uOCF). uOCF focuses on learning the intrinsics of objects and models the extrinsics separately. Our approach significantly improves systematic generalization, thus enabling unsupervised learning of high-fidelity object-centric scene representations from sparse real-world images. To evaluate our approach, we collect three new datasets, including two real kitchen environments. Extensive experiments show that uOCF enables unsupervised discovery of visually rich objects from a single real image, allowing applications such as 3D object segmentation and scene manipulation. Notably, uOCF demonstrates zero-shot generalization to unseen objects from a single real image. Project page: https://red-fairy.github.io/uOCF/
