Table of Contents
Fetching ...

CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion

Yiran Chen, Anyi Rao, Xuekun Jiang, Shishi Xiao, Ruiqing Ma, Zeyu Wang, Hui Xiong, Bo Dai

TL;DR

CinePreGen addresses a core gap in AI-driven video previsualization: enabling precise, cinematic camera motion within generative workflows. By fusing game-engine ground truth with diffusion rendering through the CineSpace camera space, a storyboard-driven pipeline, and region-based prompts, it achieves coherent, motion-rich video outputs. A within-subjects study with 12 participants demonstrates superior usability, consistency, and perceived realism of camera movements compared to a 2D baseline tool, validating its practical impact. The work offers a tangible path to integrating traditional cinematography practice with modern AI techniques, benefiting both individual creators and industry professionals.

Abstract

With advancements in video generative AI models (e.g., SORA), creators are increasingly using these techniques to enhance video previsualization. However, they face challenges with incomplete and mismatched AI workflows. Existing methods mainly rely on text descriptions and struggle with camera placement, a key component of previsualization. To address these issues, we introduce CinePreGen, a visual previsualization system enhanced with engine-powered diffusion. It features a novel camera and storyboard interface that offers dynamic control, from global to local camera adjustments. This is combined with a user-friendly AI rendering workflow, which aims to achieve consistent results through multi-masked IP-Adapter and engine simulation guidelines. In our comprehensive evaluation study, we demonstrate that our system reduces development viscosity (i.e., the complexity and challenges in the development process), meets users' needs for extensive control and iteration in the design process, and outperforms other AI video production workflows in cinematic camera movement, as shown by our experiments and a within-subjects user study. With its intuitive camera controls and realistic rendering of camera motion, CinePreGen shows great potential for improving video production for both individual creators and industry professionals.

CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion

TL;DR

CinePreGen addresses a core gap in AI-driven video previsualization: enabling precise, cinematic camera motion within generative workflows. By fusing game-engine ground truth with diffusion rendering through the CineSpace camera space, a storyboard-driven pipeline, and region-based prompts, it achieves coherent, motion-rich video outputs. A within-subjects study with 12 participants demonstrates superior usability, consistency, and perceived realism of camera movements compared to a 2D baseline tool, validating its practical impact. The work offers a tangible path to integrating traditional cinematography practice with modern AI techniques, benefiting both individual creators and industry professionals.

Abstract

With advancements in video generative AI models (e.g., SORA), creators are increasingly using these techniques to enhance video previsualization. However, they face challenges with incomplete and mismatched AI workflows. Existing methods mainly rely on text descriptions and struggle with camera placement, a key component of previsualization. To address these issues, we introduce CinePreGen, a visual previsualization system enhanced with engine-powered diffusion. It features a novel camera and storyboard interface that offers dynamic control, from global to local camera adjustments. This is combined with a user-friendly AI rendering workflow, which aims to achieve consistent results through multi-masked IP-Adapter and engine simulation guidelines. In our comprehensive evaluation study, we demonstrate that our system reduces development viscosity (i.e., the complexity and challenges in the development process), meets users' needs for extensive control and iteration in the design process, and outperforms other AI video production workflows in cinematic camera movement, as shown by our experiments and a within-subjects user study. With its intuitive camera controls and realistic rendering of camera motion, CinePreGen shows great potential for improving video production for both individual creators and industry professionals.
Paper Structure (28 sections, 10 figures)

This paper contains 28 sections, 10 figures.

Figures (10)

  • Figure 1: CinePreGen consists of two sub-modules: the layout design module (a) and the AI rendering module (b). (a1) The 3D viewer of the entire scene. (a2) The current active camera viewer. (a3) The camera editor. (a4) The storyboard editor. (a5) The preview viewer. (a6) The timeline. (b1) The shot prompts panel. (b2) The environment settings panel. (b3) The character settings panel. (b4) The output preview panel.
  • Figure 2: CineSpace, a novel and efficient representation for camera parameter space, which defined the camera's behavior by three key parameters $(d,\theta, \varphi)$.
  • Figure 3: An example of yaw rotation in CinePreGen, where the camera uses two characters as the visual focus. As the camera yaws, it rotates based on the CineSpace coordinate system, ensuring that both characters remain in the frame throughout the movement.
  • Figure 4: Diffusion Rendering Workflow—The process starts with obtaining raw footage from the engine, followed by exporting ground truth data (e.g., depth maps and pose images), applying masks for targeted control, and ensuring consistency in visual style and character identity using AnimateDiff and IP-Adapter.
  • Figure 5: Participants' work by using CinePreGen, with the left side showing the original footage of camera movements generated in the engine, and the right side displaying the rendered results with the annotation of the shot type and their prompts.
  • ...and 5 more figures