From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries
Joy Hsu, Emily Jin, Jiajun Wu, Niloy J. Mitra
TL;DR
FactoredScenes tackles real-world indoor scene generation under limited data by factorizing scenes into room-layout programs and pose variations. It learns a library of reusable room-structure functions from synthetic data, uses an LLM to generate scene programs regularized by this library, executes the programs to obtain layouts, and then hierarchically predicts oriented object poses before retrieving concrete 3D instances. The approach yields state-of-the-art realism on ScanNet-like data, with strong quantitative metrics (FID/KID) and human studies showing generated rooms closely resemble real ones, even when tested on unseen scenes. This modular, library-regularized pipeline enables leveraging diverse data sources and provides a scalable path toward realistic, program-driven scene synthesis for datasets and benchmarks.
Abstract
Real-world scenes, such as those in ScanNet, are difficult to capture, with highly limited data available. Generating realistic scenes with varied object poses remains an open and challenging task. In this work, we propose FactoredScenes, a framework that synthesizes realistic 3D scenes by leveraging the underlying structure of rooms while learning the variation of object poses from lived-in scenes. We introduce a factored representation that decomposes scenes into hierarchically organized concepts of room programs and object poses. To encode structure, FactoredScenes learns a library of functions capturing reusable layout patterns from which scenes are drawn, then uses large language models to generate high-level programs, regularized by the learned library. To represent scene variations, FactoredScenes learns a program-conditioned model to hierarchically predict object poses, and retrieves and places 3D objects in a scene. We show that FactoredScenes generates realistic, real-world rooms that are difficult to distinguish from real ScanNet scenes.
