Table of Contents
Fetching ...

WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents

Xinhang Liu, Chi-Keung Tang, Yu-Wing Tai

TL;DR

WorldCraft tackles the challenge of democratizing photorealistic 3D world creation by enabling natural-language-driven generation and editing of indoor and outdoor scenes. It introduces a coordinated multi-agent pipeline with ForgeIt for object-level generation and auto-verification, ArrangeIt for hierarchical layout optimization, and a trajectory-control module for animation, all orchestrated by a central Scene Generation Coordinator. ForgeIt builds an ever-growing manual via auto-verification to guide procedural generators, while ArrangeIt casts layout as a hierarchical numerical optimization with an objective $L(\{\\mathbf{p}_i, \\boldsymbol{\\theta}_i\\}_{i=1}^n) = \\sum_{j=1}^{m} \\lambda_j L_j(\\{\\mathbf{p}_i, \\boldsymbol{\\theta}_i\\}_{i=1}^n)$ subject to $c_1, ..., c_k$, enabling object placement under ergonomic and aesthetic constraints, solved via simulated annealing. The results show superior consistency, aesthetics, and functionality compared with state-of-the-art baselines, and experiments demonstrate practical usability, including user studies and qualitative assessments.

Abstract

Constructing photorealistic virtual worlds has applications across various fields, but it often requires the extensive labor of highly trained professionals to operate conventional 3D modeling software. To democratize this process, we introduce WorldCraft, a system where large language model (LLM) agents leverage procedural generation to create indoor and outdoor scenes populated with objects, allowing users to control individual object attributes and the scene layout using intuitive natural language commands. In our framework, a coordinator agent manages the overall process and works with two specialized LLM agents to complete the scene creation: ForgeIt, which integrates an ever-growing manual through auto-verification to enable precise customization of individual objects, and ArrangeIt, which formulates hierarchical optimization problems to achieve a layout that balances ergonomic and aesthetic considerations. Additionally, our pipeline incorporates a trajectory control agent, allowing users to animate the scene and operate the camera through natural language interactions. Our system is also compatible with off-the-shelf deep 3D generators to enrich scene assets. Through evaluations and comparisons with state-of-the-art methods, we demonstrate the versatility of WorldCraft, ranging from single-object customization to intricate, large-scale interior and exterior scene designs. This system empowers non-professionals to bring their creative visions to life.

WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents

TL;DR

WorldCraft tackles the challenge of democratizing photorealistic 3D world creation by enabling natural-language-driven generation and editing of indoor and outdoor scenes. It introduces a coordinated multi-agent pipeline with ForgeIt for object-level generation and auto-verification, ArrangeIt for hierarchical layout optimization, and a trajectory-control module for animation, all orchestrated by a central Scene Generation Coordinator. ForgeIt builds an ever-growing manual via auto-verification to guide procedural generators, while ArrangeIt casts layout as a hierarchical numerical optimization with an objective subject to , enabling object placement under ergonomic and aesthetic constraints, solved via simulated annealing. The results show superior consistency, aesthetics, and functionality compared with state-of-the-art baselines, and experiments demonstrate practical usability, including user studies and qualitative assessments.

Abstract

Constructing photorealistic virtual worlds has applications across various fields, but it often requires the extensive labor of highly trained professionals to operate conventional 3D modeling software. To democratize this process, we introduce WorldCraft, a system where large language model (LLM) agents leverage procedural generation to create indoor and outdoor scenes populated with objects, allowing users to control individual object attributes and the scene layout using intuitive natural language commands. In our framework, a coordinator agent manages the overall process and works with two specialized LLM agents to complete the scene creation: ForgeIt, which integrates an ever-growing manual through auto-verification to enable precise customization of individual objects, and ArrangeIt, which formulates hierarchical optimization problems to achieve a layout that balances ergonomic and aesthetic considerations. Additionally, our pipeline incorporates a trajectory control agent, allowing users to animate the scene and operate the camera through natural language interactions. Our system is also compatible with off-the-shelf deep 3D generators to enrich scene assets. Through evaluations and comparisons with state-of-the-art methods, we demonstrate the versatility of WorldCraft, ranging from single-object customization to intricate, large-scale interior and exterior scene designs. This system empowers non-professionals to bring their creative visions to life.

Paper Structure

This paper contains 13 sections, 1 equation, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of WorldCraft pipeline. Starting with simple text input from the user, our coordinator agent creates a 3D scene in three stages: (a) Object creation. The agent identifies objects that will appear in the scene and utilizes our ForgeIt system, or optionally, off-the-shelf deep 3D generators, to acquire the necessary assets. (b) Layout generation. The agent invokes our ArrangeIt module to design a layout that meets functional and aesthetic requirements. (c) Scene animation. Users can control objects or the camera trajectory through conversations to animate the scene and synthesize videos.
  • Figure 2: An example of user-agent and agent-agent interactions for decomposing tasks and collaboratively creating a 3D scene, demonstrating the system's capability to manage complex requests and facilitate user customization.
  • Figure 3: Manual construction procedure of ForgeIt. The critic model assigns the ForgeIt agent a text-to-3D generation task. The ForgeIt agent then synthesizes and executes a program in an attempt to generate the object. Subsequently, the critic model evaluates whether the generated object meets the task's requirements. If deemed successful, a record is committed to the manual.
  • Figure 4: Formulation of the hierarchical numerical optimization in ArrangeIt. The agent constructs an object tree to hierarchically decompose the arrangement problem into subproblems, each of which is then modeled within our optimization protocol.
  • Figure 5: Language-guided complex scene generation. Examples illustrating our method’s capability to generate expansive 3D indoor and outdoor scenes, richly populated with diverse objects.
  • ...and 4 more figures