Imperative vs. Declarative Programming Paradigms for Open-Universe Scene Generation
Maxim Gumin, Do Heon Han, Seung Jean Yoo, Aditya Ganeshan, R. Kenny Jones, Rio Aguina-Kang, Stewart Morris, Daniel Ritchie
TL;DR
This work challenges the prevailing declarative paradigm for open-universe 3D scene generation by proposing an imperative, LLM-driven program synthesis approach that places objects sequentially and relative to existing ones. An LLM-free error correction mechanism refines the generated programs through low-dimensional parameter updates, enhancing robustness without additional LLM calls. Through perceptual studies, the imperative method is shown to be preferred over strong declarative baselines, and a new automated evaluation metric aligns closely with human judgments. The paper provides a comprehensive evaluation protocol, analyzes the trade-offs between paradigms, discusses limitations, and outlines future directions including dynamic scenes and LLM-compatible DSL design.
Abstract
Current methods for generating 3D scene layouts from text predominantly follow a declarative paradigm, where a Large Language Model (LLM) specifies high-level constraints that are then resolved by a separate solver. This paper challenges that consensus by introducing a more direct, imperative approach. We task an LLM with generating a step-by-step program that iteratively places each object relative to those already in the scene. This paradigm simplifies the underlying scene specification language, enabling the creation of more complex, varied, and highly structured layouts that are difficult to express declaratively. To improve the robustness, we complement our method with a novel, LLM-free error correction mechanism that operates directly on the generated code, iteratively adjusting parameters within the program to resolve collisions and other inconsistencies. In forced-choice perceptual studies, human participants overwhelmingly preferred our imperative layouts, choosing them over those from two state-of-the-art declarative systems 82% and 94% of the time, demonstrating the significant potential of this alternative paradigm. Finally, we present a simple automated evaluation metric for 3D scene layout generation that correlates strongly with human judgment.
