DecompDreamer: A Composition-Aware Curriculum for Structured 3D Asset Generation
Utkarsh Nath, Rajeev Goel, Rahul Khurana, Kyle Min, Mark Ollila, Pavan Turaga, Varun Jampani, Tejaswi Gowda
TL;DR
DecompDreamer reframes compositional text-to-3D generation as an optimization scheduling problem plagued by gradient conflicts when optimizing multiple objects and relations simultaneously. It introduces a staged, composition-aware curriculum that first builds a global relational scaffold and then refines high-frequency object details with view-aware supervision and negative prompts, implemented via a two-stage loss schedule. The method leverages a Vision-Language Model to produce a scene graph, Gaussian Splatting for 3D representation, and a flow-based supervision signal to stabilize training, achieving state-of-the-art fidelity, disentanglement, and spatial coherence on complex prompts. Empirical results demonstrate superior scalability and robustness compared with prior methods, enabling reliable generation of multi-object 3D assets with intricate interactions for practical applications.
Abstract
Current text-to-3D methods excel at generating single objects but falter on compositional prompts. We argue this failure is fundamental to their optimization schedules, as simultaneous or iterative heuristics predictably collapse under a combinatorial explosion of conflicting gradients, leading to entangled geometry or catastrophic divergence. In this paper, we reframe the core challenge of compositional generation as one of optimization scheduling. We introduce DecompDreamer, a framework built on a novel staged optimization strategy that functions as an implicit curriculum. Our method first establishes a coherent structural scaffold by prioritizing inter-object relationships before shifting to the high-fidelity refinement of individual components. This temporal decoupling of competing objectives provides a robust solution to gradient conflict. Qualitative and quantitative evaluations on diverse compositional prompts demonstrate that DecompDreamer outperforms state-of-the-art methods in fidelity, disentanglement, and spatial coherence.
