Table of Contents
Fetching ...

PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video Previsualization

Erzhen Hu, Frederik Brudy, David Ledo, George Fitzmaurice, Fraser Anderson

TL;DR

PrevizWhiz tackles the gap between speed-focused storyboards and asset-heavy 3D previz by uniting rough 3D blocking with generative AI restyling and motion guidance. The system supports three fidelity levels and integrates a video playground for external-motion guidance, enabling rapid, multi-modal previews that communicate intent to stakeholders. A user study with 10 industry professionals shows improved iteration speed and cross-disciplinary collaboration, while revealing concerns about controllability, continuity, and the impact of AI on labor. The work demonstrates that AI-assisted previz can democratize creative exploration and collaboration in filmmaking, while underscoring the need for transparent provenance, attribution, and careful integration into existing pipelines.

Abstract

In pre-production, filmmakers and 3D animation experts must rapidly prototype ideas to explore a film's possibilities before fullscale production, yet conventional approaches involve trade-offs in efficiency and expressiveness. Hand-drawn storyboards often lack spatial precision needed for complex cinematography, while 3D previsualization demands expertise and high-quality rigged assets. To address this gap, we present PrevizWhiz, a system that leverages rough 3D scenes in combination with generative image and video models to create stylized video previews. The workflow integrates frame-level image restyling with adjustable resemblance, time-based editing through motion paths or external video inputs, and refinement into high-fidelity video clips. A study with filmmakers demonstrates that our system lowers technical barriers for film-makers, accelerates creative iteration, and effectively bridges the communication gap, while also surfacing challenges of continuity, authorship, and ethical consideration in AI-assisted filmmaking.

PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video Previsualization

TL;DR

PrevizWhiz tackles the gap between speed-focused storyboards and asset-heavy 3D previz by uniting rough 3D blocking with generative AI restyling and motion guidance. The system supports three fidelity levels and integrates a video playground for external-motion guidance, enabling rapid, multi-modal previews that communicate intent to stakeholders. A user study with 10 industry professionals shows improved iteration speed and cross-disciplinary collaboration, while revealing concerns about controllability, continuity, and the impact of AI on labor. The work demonstrates that AI-assisted previz can democratize creative exploration and collaboration in filmmaking, while underscoring the need for transparent provenance, attribution, and careful integration into existing pipelines.

Abstract

In pre-production, filmmakers and 3D animation experts must rapidly prototype ideas to explore a film's possibilities before fullscale production, yet conventional approaches involve trade-offs in efficiency and expressiveness. Hand-drawn storyboards often lack spatial precision needed for complex cinematography, while 3D previsualization demands expertise and high-quality rigged assets. To address this gap, we present PrevizWhiz, a system that leverages rough 3D scenes in combination with generative image and video models to create stylized video previews. The workflow integrates frame-level image restyling with adjustable resemblance, time-based editing through motion paths or external video inputs, and refinement into high-fidelity video clips. A study with filmmakers demonstrates that our system lowers technical barriers for film-makers, accelerates creative iteration, and effectively bridges the communication gap, while also surfacing challenges of continuity, authorship, and ethical consideration in AI-assisted filmmaking.
Paper Structure (68 sections, 12 figures, 1 table)

This paper contains 68 sections, 12 figures, 1 table.

Figures (12)

  • Figure 1: PrevizWhiz Scene Blocking and Composition Overview: (a) where users can select different squences and scenes. (b) 3D Environment Panel for setting up cameras (b1), lens and lighting, and (b2) exploring the 3D scene pan/tilt/orbit controls. (c) Timeline Panel for blocking camera, avatar, and element movements, including (c1) camera tracks, (c2) element animations, (c3) fixed/movable elements, (c4) color adjustments, and (c5) clip editing; (c6) the clip with restyled images attached. (d) Image Style Panel where (d1) 3D input and 2D output can be compared, with the prompt inputs including the (d2) description of customized characters, and (d3) background prompts entered or composed. (e) Inpainting composer tools with (e1) brush tools on identifying (e2) editable regions, (e3) text prompts describing the target details, which can help getting (e4) applied details such as painting a stenciled text. (f) Prompt Composer with (f1) basic background scene descriptions, (f2) visual style options, and (f3) mood/tone settings. (g) Resemblance control to balance resemblance vs. creativity in the generated output. Full video and scene output of this scene is shown in Appendix.
  • Figure 2: Example adherence to source color in image generation with visual styles enabled by LoRAs: (a) Cinematic; (b) 3D Cartoon. Each row shows four resemblance levels (Strict, Faithful, Flexible, and Loose) which progressively relax the degree of adherence to the original 3D input (column 0). The FE values shown above each column refer to FlowEdit parameters, and the CN values refer to ControlNet parameters, which together control how strongly the generated images follow the source 3D frame.
  • Figure 3: Scene with Lighting variations (a) Sunny Day (b) Dawn/Early Morning (c) Dark Room with four resemblance levels.
  • Figure 4: Demonstrating athe Video Style Interface with a three-person interaction example: (a) Video Import Panel: Users can (a1) import online or live video footage, crop it, and (a2) process it into skeleton videos. These skeletons can then be (a3) dragged onto the timeline, where (a5) a new video track is created to guide character movements in the scene. (b) Video Remix Editor: provides tools for manipulating, aligning, and refining processed video layers with character positions in the 3D scene (e.g., orientation, gestures). Users can (b1) resize and reposition clips, (b2) split them, and (b3) arrange the split segments to match character scale before recompositing them into a guiding video. (c) Video Style Panel: includes (c1) the processed external video inputs; (c2) image descriptions and references from the image style panel; (c3) a video description field where users can specify additional movement details; and (c4) a resemblance–creativity control, where Resemble follows spatially defined motion, while Creative generates outputs based on the text prompt.
  • Figure 5: PrevizWhiz Walkthrough Example: (a) Scene setup. The director positions two characters in the 3D blocking panel, adjusts props, lighting, and colour, and configures complementary camera angles. (b) Camera and Motion Authoring. The director first defines rough motion guidance and camera placement (1–6), then applies granular motion guidance using control-video references and skeleton alignment to refine gestures, body posture, and interaction timing. (c) Generated Shot and Style Authoring. The Image Styling panel previews lighting presets with resemblance levels (Original, Strict, Faithful) to explore visual tone. Granular generative video output further blends the animated 3D blocking with external motion footage, enhancing realism, lighting continuity, and character interactions across the final sequence.
  • ...and 7 more figures