Disentangled 3D Scene Generation with Layout Learning
Dave Epstein, Ben Poole, Ben Mildenhall, Alexei A. Efros, Aleksander Holynski
TL;DR
Disentangled 3D Scene Generation with Layout Learning introduces an unsupervised approach to decompose generated scenes into objects by optimizing $K$ NeRFs together with a set of learnable layouts under a pretrained text-to-image diffusion prior via score distillation sampling (SDS). Objects are defined as parts of a scene that can be rearranged by affine transforms, enabling the composite density $\tau'$ to be formed from $\sum_k \tau_k$ with color $\boldsymbol{\rho}' = \sum_k (\tau_k/\tau') \boldsymbol{\rho}_k$, producing coherent multi-object scenes. The method yields high-quality, editable 3D scenes and enables object-level manipulation and asset integration without supervision, with quantitative CLIP-based evaluation showing competitive disentanglement performance relative to per-object supervision. Limitations include the inherent ill-posedness of 3D disentanglement, failure modes like the Janus problem, and diffusion-model biases, underscoring ongoing challenges and ethical considerations in unsupervised text-to-3D generation.
Abstract
We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs from scratch - each representing its own object - along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation. For results and an interactive demo, see our project page at https://dave.ml/layoutlearning/
