Table of Contents
Fetching ...

Automated Movie Generation via Multi-Agent CoT Planning

Weijia Wu, Zeyu Zhu, Mike Zheng Shou

TL;DR

This work tackles the challenge of automated long-form movie generation by proposing MovieAgent, a hierarchical multi-agent framework driven by internal chain-of-thought planning. It deploys specialized agents—Director, Scene Plan, and Shot Plan—to decompose a script into sub-scripts, scenes, and shot-level instructions, which are then used to generate synchronized video and audio with consistent characters. Empirical results show state-of-the-art performance on script faithfulness, character consistency, and narrative coherence, validated by both automated metrics and human evaluations on the MoviePrompts dataset. The approach promises near-zero-cost, scalable AI-assisted filmmaking with potential practical impact on entertainment production workflows.

Abstract

Existing long-form video generation frameworks lack automated planning, requiring manual input for storylines, scenes, cinematography, and character interactions, resulting in high costs and inefficiencies. To address these challenges, we present MovieAgent, an automated movie generation via multi-agent Chain of Thought (CoT) planning. MovieAgent offers two key advantages: 1) We firstly explore and define the paradigm of automated movie/long-video generation. Given a script and character bank, our MovieAgent can generates multi-scene, multi-shot long-form videos with a coherent narrative, while ensuring character consistency, synchronized subtitles, and stable audio throughout the film. 2) MovieAgent introduces a hierarchical CoT-based reasoning process to automatically structure scenes, camera settings, and cinematography, significantly reducing human effort. By employing multiple LLM agents to simulate the roles of a director, screenwriter, storyboard artist, and location manager, MovieAgent streamlines the production pipeline. Experiments demonstrate that MovieAgent achieves new state-of-the-art results in script faithfulness, character consistency, and narrative coherence. Our hierarchical framework takes a step forward and provides new insights into fully automated movie generation. The code and project website are available at: https://github.com/showlab/MovieAgent and https://weijiawu.github.io/MovieAgent.

Automated Movie Generation via Multi-Agent CoT Planning

TL;DR

This work tackles the challenge of automated long-form movie generation by proposing MovieAgent, a hierarchical multi-agent framework driven by internal chain-of-thought planning. It deploys specialized agents—Director, Scene Plan, and Shot Plan—to decompose a script into sub-scripts, scenes, and shot-level instructions, which are then used to generate synchronized video and audio with consistent characters. Empirical results show state-of-the-art performance on script faithfulness, character consistency, and narrative coherence, validated by both automated metrics and human evaluations on the MoviePrompts dataset. The approach promises near-zero-cost, scalable AI-assisted filmmaking with potential practical impact on entertainment production workflows.

Abstract

Existing long-form video generation frameworks lack automated planning, requiring manual input for storylines, scenes, cinematography, and character interactions, resulting in high costs and inefficiencies. To address these challenges, we present MovieAgent, an automated movie generation via multi-agent Chain of Thought (CoT) planning. MovieAgent offers two key advantages: 1) We firstly explore and define the paradigm of automated movie/long-video generation. Given a script and character bank, our MovieAgent can generates multi-scene, multi-shot long-form videos with a coherent narrative, while ensuring character consistency, synchronized subtitles, and stable audio throughout the film. 2) MovieAgent introduces a hierarchical CoT-based reasoning process to automatically structure scenes, camera settings, and cinematography, significantly reducing human effort. By employing multiple LLM agents to simulate the roles of a director, screenwriter, storyboard artist, and location manager, MovieAgent streamlines the production pipeline. Experiments demonstrate that MovieAgent achieves new state-of-the-art results in script faithfulness, character consistency, and narrative coherence. Our hierarchical framework takes a step forward and provides new insights into fully automated movie generation. The code and project website are available at: https://github.com/showlab/MovieAgent and https://weijiawu.github.io/MovieAgent.

Paper Structure

This paper contains 30 sections, 4 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Comparison of Traditional and Automated Movie Production. Traditional filmmaking requires manual planning, while our MovieAgent automates script breakdown, scene planning, and shot design, enhancing efficiency and narrative coherence.
  • Figure 2: The Overall Pipeline for MovieAgent. The proposed framework employs a hierarchical CoT reasoning process with director, scene plan, and shot plan agents to automate long-form movie generation.
  • Figure 3: Customized Shot-Level Video Generation for MovieAgent. Current shot-level character-aware video generation approaches can be divided into three categories: (a) Keyframe-based two-stage video generation; (b) One-stage end-to-end video generation; (c) Keyframe-based joint video and audio generation.
  • Figure 4: Flowchart of the Internal Chain-of-Thought reasoning process. Through Internal CoT, various agents can process and manage narrative elements more efficiently.
  • Figure 5: More Visualizations for MovieAgent. Our MovieAgent can generate coherent storylines and detailed shots.
  • ...and 6 more figures