Mind-of-Director: Multi-modal Agent-Driven Film Previsualization via Collaborative Decision-Making

Shufeng Nan; Mengtian Li; Sixiao Zheng; Yuwei Lu; Han Zhang; Yanwei Fu

Mind-of-Director: Multi-modal Agent-Driven Film Previsualization via Collaborative Decision-Making

Shufeng Nan, Mengtian Li, Sixiao Zheng, Yuwei Lu, Han Zhang, Yanwei Fu

Abstract

We present Mind-of-Director, a multi-modal agent-driven framework for film previz that models the collaborative decision-making process of a film production team. Given a creative idea, Mind-of-Director orchestrates multiple specialized agents to produce previz sequences within the game engine. The framework consists of four cooperative modules: Script Development, where agents draft and refine the screenplay iteratively; Virtual Scene Design, which transforms text into semantically aligned 3D environments; Character Behaviour Control, which determines character blocking and motion; and Camera Planning, which optimizes framing, movement, and composition for cinematic camera effects. A real-time visual editing system built in the game engine further enables interactive inspection and synchronized timeline adjustment across scenes, behaviours, and cameras. Extensive experiments and human evaluations show that Mind-of-Director generates high-quality, semantically grounded previz sequences in approximately 25 minutes per idea, demonstrating the effectiveness of agent collaboration for both automated prototyping and human-in-the-loop filmmaking.

Mind-of-Director: Multi-modal Agent-Driven Film Previsualization via Collaborative Decision-Making

Abstract

Paper Structure (17 sections, 5 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 17 sections, 5 equations, 4 figures, 1 table, 2 algorithms.

Introduction
Related Work
Methodology
Script Development
Virtual Scene Design
Character Behaviour Control
Camera Planning
Experiments
Experiment Setup
Quantitative Analysis
Qualitative Analysis
Human Evaluation
Ablation Studies
Discussion
Limitations and Future Work
...and 2 more sections

Figures (4)

Figure 1: Traditional Previz vs. Mind-of-Director.Traditional Previz requires iterative collaboration across multiple departments (typically iterations $N \gg 1$), involving script writing, 2D storyboarding, 3D scene construction, character blocking, animatic production, and camera planning. In contrast, Mind-of-Director automates this process ($N=1$) through multi-modal agents that collaborate in real-time decision-making to generate high-quality, semantically aligned, and visually coherent previz sequences directly from an idea, enabling a single creator to prototype cinematic scenes with minimal manual effort in the game engine.
Figure 2: Overview of the Mind-of-Director Framework. Given a high-level idea, our multi-modal agent-driven framework simulates a structured collaborative decision-making workflow through four interconnected modules: (1) Script Development refines the screenplay via a Discuss-Revise-Judge process; (2) Virtual Scene Design builds consistent 3D environments using 2D-guided and rule-based generation under spatial constraints; (3) Character Behaviour Control optimizes character blocking and motion through agent feedback; (4) Camera Planning selects and validates cinematic shots via a Debate-Judge-Validation loop for physical plausibility. All modules are integrated in Unity for real-time visualization and iterative refinement.
Figure 3: Qualitative Comparison. We present a representative sample from Act $A_i$ to demonstrate our framework's performance and cross-stage consistency. The image shows results across four stages: (a) Script Development: Comparison of screenplay generated by Solo vs. Agent Collaboration; (b) Virtual Scene Design: Comparison of scene layouts from StageDesigner and our approach with improved spatial grounding; (c) Character Behaviour Control: Character positioning from FilmAgent vs. our agent-driven method; (d) Camera Planning: Camera shot selection, comparing FilmAgent and our approach.
Figure 4: Unity-Based Interface. Our system provides synchronized timeline tracks for characters and cameras, enabling real-time inspection, editing, and visualization across all stages.

Mind-of-Director: Multi-modal Agent-Driven Film Previsualization via Collaborative Decision-Making

Abstract

Mind-of-Director: Multi-modal Agent-Driven Film Previsualization via Collaborative Decision-Making

Authors

Abstract

Table of Contents

Figures (4)