Table of Contents
Fetching ...

DecompDreamer: A Composition-Aware Curriculum for Structured 3D Asset Generation

Utkarsh Nath, Rajeev Goel, Rahul Khurana, Kyle Min, Mark Ollila, Pavan Turaga, Varun Jampani, Tejaswi Gowda

TL;DR

DecompDreamer reframes compositional text-to-3D generation as an optimization scheduling problem plagued by gradient conflicts when optimizing multiple objects and relations simultaneously. It introduces a staged, composition-aware curriculum that first builds a global relational scaffold and then refines high-frequency object details with view-aware supervision and negative prompts, implemented via a two-stage loss schedule. The method leverages a Vision-Language Model to produce a scene graph, Gaussian Splatting for 3D representation, and a flow-based supervision signal to stabilize training, achieving state-of-the-art fidelity, disentanglement, and spatial coherence on complex prompts. Empirical results demonstrate superior scalability and robustness compared with prior methods, enabling reliable generation of multi-object 3D assets with intricate interactions for practical applications.

Abstract

Current text-to-3D methods excel at generating single objects but falter on compositional prompts. We argue this failure is fundamental to their optimization schedules, as simultaneous or iterative heuristics predictably collapse under a combinatorial explosion of conflicting gradients, leading to entangled geometry or catastrophic divergence. In this paper, we reframe the core challenge of compositional generation as one of optimization scheduling. We introduce DecompDreamer, a framework built on a novel staged optimization strategy that functions as an implicit curriculum. Our method first establishes a coherent structural scaffold by prioritizing inter-object relationships before shifting to the high-fidelity refinement of individual components. This temporal decoupling of competing objectives provides a robust solution to gradient conflict. Qualitative and quantitative evaluations on diverse compositional prompts demonstrate that DecompDreamer outperforms state-of-the-art methods in fidelity, disentanglement, and spatial coherence.

DecompDreamer: A Composition-Aware Curriculum for Structured 3D Asset Generation

TL;DR

DecompDreamer reframes compositional text-to-3D generation as an optimization scheduling problem plagued by gradient conflicts when optimizing multiple objects and relations simultaneously. It introduces a staged, composition-aware curriculum that first builds a global relational scaffold and then refines high-frequency object details with view-aware supervision and negative prompts, implemented via a two-stage loss schedule. The method leverages a Vision-Language Model to produce a scene graph, Gaussian Splatting for 3D representation, and a flow-based supervision signal to stabilize training, achieving state-of-the-art fidelity, disentanglement, and spatial coherence on complex prompts. Empirical results demonstrate superior scalability and robustness compared with prior methods, enabling reliable generation of multi-object 3D assets with intricate interactions for practical applications.

Abstract

Current text-to-3D methods excel at generating single objects but falter on compositional prompts. We argue this failure is fundamental to their optimization schedules, as simultaneous or iterative heuristics predictably collapse under a combinatorial explosion of conflicting gradients, leading to entangled geometry or catastrophic divergence. In this paper, we reframe the core challenge of compositional generation as one of optimization scheduling. We introduce DecompDreamer, a framework built on a novel staged optimization strategy that functions as an implicit curriculum. Our method first establishes a coherent structural scaffold by prioritizing inter-object relationships before shifting to the high-fidelity refinement of individual components. This temporal decoupling of competing objectives provides a robust solution to gradient conflict. Qualitative and quantitative evaluations on diverse compositional prompts demonstrate that DecompDreamer outperforms state-of-the-art methods in fidelity, disentanglement, and spatial coherence.

Paper Structure

This paper contains 30 sections, 9 equations, 18 figures, 3 tables, 1 algorithm.

Figures (18)

  • Figure 1: Illustration of high-quality compositional 3D assets generated by DecompDreamer for complex text prompts. Existing methods often miss objects or distort spatial relationships, while DecompDreamer accurately captures geometry and preserves inter-object layout.
  • Figure 2: A visual taxonomy of optimization heuristics. (a) Holistic uses a single global loss. (b) Simultaneous applies all losses concurrently. (c) Iterative applies losses in a sequential loop. (d) Our staged curriculum temporally decouples relational and object losses
  • Figure 3: Overview of the DecompDreamer pipeline. Given a text prompt, a VLM generates a scene graph to guide a coarse initialization. The core of our method is a composition-aware optimization curriculum that first models joint relationships to build a coherent structure, then refines individual objects to produce high-fidelity, disentangled 3D assets.
  • Figure 4: Qualitative comparison between the proposed DecompDreamer and state-of-the-art text-to-3D generators. More results can be found in Section \ref{['subsection:more_comparison']} of the appendix.
  • Figure 5: Empirical validation of optimization heuristics. The plots show the average object, scene, and edge losses for a complex 4-object text prompt ("A knight in shining armor..."). (a) GALA3D converges suboptimally, (b) GraphDreamer diverges, while (c) our staged approach (DecompDreamer) converges stably.
  • ...and 13 more figures