Table of Contents
Fetching ...

FilmSceneDesigner: Chaining Set Design for Procedural Film Scene Generation

Zhifeng Xie, Keyi Zhang, Yiye Yan, Yuling Guo, Fan Yang, Jiting Zhou, Mengtian Li

TL;DR

Problem: manual film set design is labor-intensive and time-consuming; Approach: FilmSceneDesigner combines an agent-based chaining framework with a four-stage procedural generation pipeline that maps natural language scene descriptions to structured parameters for floorplan/structure, material assignment, door/window placement, and object layout, integrating assets from SetDepot-Pro; Contributions: formalizes two structure types and precise geometric representations, e.g., $S = [R_1, \\dots, R_n]$, $E_{ij} = [x_{start}, y_{start}, x_{end}, y_{end}]$ (line) and $E_{ij} = [x_{start}, y_{start}, x_{end}, y_{end}, h_{chord}]$ (arc), $A = \\{(r_i, r_j, \\text{relation})\\}$, and $p(o_r) = p(o_a) + \\lambda(d) \\cdot \\vec{v}(s)$, and a hook-based data bridge; Datasets: SetDepot-Pro with 6,862 assets and 733 materials enables semantic retrieval via Sentence-BERT. Findings: GPT-4V-based evaluations and user studies show superior alignment with cinematic goals across layout, material realism, style, and atmosphere, especially in culturally specific scenes. Significance: supports scalable, production-ready previs, construction drawings, and mood boards, improving realism and efficiency in film production workflows.

Abstract

Film set design plays a pivotal role in cinematic storytelling and shaping the visual atmosphere. However, the traditional process depends on expert-driven manual modeling, which is labor-intensive and time-consuming. To address this issue, we introduce FilmSceneDesigner, an automated scene generation system that emulates professional film set design workflow. Given a natural language description, including scene type, historical period, and style, we design an agent-based chaining framework to generate structured parameters aligned with film set design workflow, guided by prompt strategies that ensure parameter accuracy and coherence. On the other hand, we propose a procedural generation pipeline which executes a series of dedicated functions with the structured parameters for floorplan and structure generation, material assignment, door and window placement, and object retrieval and layout, ultimately constructing a complete film scene from scratch. Moreover, to enhance cinematic realism and asset diversity, we construct SetDepot-Pro, a curated dataset of 6,862 film-specific 3D assets and 733 materials. Experimental results and human evaluations demonstrate that our system produces structurally sound scenes with strong cinematic fidelity, supporting downstream tasks such as virtual previs, construction drawing and mood board creation.

FilmSceneDesigner: Chaining Set Design for Procedural Film Scene Generation

TL;DR

Problem: manual film set design is labor-intensive and time-consuming; Approach: FilmSceneDesigner combines an agent-based chaining framework with a four-stage procedural generation pipeline that maps natural language scene descriptions to structured parameters for floorplan/structure, material assignment, door/window placement, and object layout, integrating assets from SetDepot-Pro; Contributions: formalizes two structure types and precise geometric representations, e.g., , (line) and (arc), , and , and a hook-based data bridge; Datasets: SetDepot-Pro with 6,862 assets and 733 materials enables semantic retrieval via Sentence-BERT. Findings: GPT-4V-based evaluations and user studies show superior alignment with cinematic goals across layout, material realism, style, and atmosphere, especially in culturally specific scenes. Significance: supports scalable, production-ready previs, construction drawings, and mood boards, improving realism and efficiency in film production workflows.

Abstract

Film set design plays a pivotal role in cinematic storytelling and shaping the visual atmosphere. However, the traditional process depends on expert-driven manual modeling, which is labor-intensive and time-consuming. To address this issue, we introduce FilmSceneDesigner, an automated scene generation system that emulates professional film set design workflow. Given a natural language description, including scene type, historical period, and style, we design an agent-based chaining framework to generate structured parameters aligned with film set design workflow, guided by prompt strategies that ensure parameter accuracy and coherence. On the other hand, we propose a procedural generation pipeline which executes a series of dedicated functions with the structured parameters for floorplan and structure generation, material assignment, door and window placement, and object retrieval and layout, ultimately constructing a complete film scene from scratch. Moreover, to enhance cinematic realism and asset diversity, we construct SetDepot-Pro, a curated dataset of 6,862 film-specific 3D assets and 733 materials. Experimental results and human evaluations demonstrate that our system produces structurally sound scenes with strong cinematic fidelity, supporting downstream tasks such as virtual previs, construction drawing and mood board creation.

Paper Structure

This paper contains 11 sections, 8 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Human Film Set Design: Traditional set design involves labor-intensive script analysis, object research, and manual modeling. FilmSceneDesigner: Our system leverages an agent chain of set design to automate procedural generation, enabling efficient generation of film sets for construction drawing, mood board, and previs.
  • Figure 2: The framework of FilmSceneDesigner. Given a scene description, FilmSceneDesigner constructs an agent-based chaining framework to generate the structured parameters, and then executes the procedural functions with these parameters for floorplan and structure, material assignment, door and window placement, and object retrieval and layout. All required assets, including materials, doors and windows, and objects, are retrieved from SetDepot-Pro to ensure high cinematic fidelity.
  • Figure 3: Retrieval Process: Textual descriptions from agent responses are encoded into vector representations using a Sentence-BERT encoder. Similarly, doors, windows, materials, and objects are encoded and stored in an embedding database. By computing similarity scores between the two embeddings, the highest-scoring assets are retrieved.
  • Figure 4: Qualitative comparison. A visual comparison of film scenes generated by our method, HOLODECK, and DreamScene. The results show that our method achieves stronger expressiveness in film scene generation, enabling scenes with distinct era, style, and regional characteristics, thereby ensuring greater historical and cinematic authenticity.