Table of Contents
Fetching ...

VideoDiff: Human-AI Video Co-Creation with Alternatives

Mina Huh, Dingzeyu Li, Kim Pimmel, Hijung Valentina Shin, Amy Pavel, Mira Dontcheva

TL;DR

VideoDiff addresses the challenge of reviewing and selecting among numerous AI-generated video editing alternatives. It introduces a co-creative tool that generates multiple suggestions for rough cuts, B-rolls, and text effects, and provides aligned, multi-view diff visualizations (timeline, transcript) to support sensemaking. In a formative study and a within-subject user evaluation (N=12), VideoDiff reduced comparison time, lowered cognitive load, and increased user satisfaction and perceived usefulness for video authoring, with some participants expressing concerns about expressiveness and control. The work demonstrates the practical value of structured alternative management in video editing and outlines future directions toward broader editing tasks, multimodal inputs, personalization, and accessibility.

Abstract

To make an engaging video, people sequence interesting moments and add visuals such as B-rolls or text. While video editing requires time and effort, AI has recently shown strong potential to make editing easier through suggestions and automation. A key strength of generative models is their ability to quickly generate multiple variations, but when provided with many alternatives, creators struggle to compare them to find the best fit. We propose VideoDiff, an AI video editing tool designed for editing with alternatives. With VideoDiff, creators can generate and review multiple AI recommendations for each editing process: creating a rough cut, inserting B-rolls, and adding text effects. VideoDiff simplifies comparisons by aligning videos and highlighting differences through timelines, transcripts, and video previews. Creators have the flexibility to regenerate and refine AI suggestions as they compare alternatives. Our study participants (N=12) could easily compare and customize alternatives, creating more satisfying results.

VideoDiff: Human-AI Video Co-Creation with Alternatives

TL;DR

VideoDiff addresses the challenge of reviewing and selecting among numerous AI-generated video editing alternatives. It introduces a co-creative tool that generates multiple suggestions for rough cuts, B-rolls, and text effects, and provides aligned, multi-view diff visualizations (timeline, transcript) to support sensemaking. In a formative study and a within-subject user evaluation (N=12), VideoDiff reduced comparison time, lowered cognitive load, and increased user satisfaction and perceived usefulness for video authoring, with some participants expressing concerns about expressiveness and control. The work demonstrates the practical value of structured alternative management in video editing and outlines future directions toward broader editing tasks, multimodal inputs, personalization, and accessibility.

Abstract

To make an engaging video, people sequence interesting moments and add visuals such as B-rolls or text. While video editing requires time and effort, AI has recently shown strong potential to make editing easier through suggestions and automation. A key strength of generative models is their ability to quickly generate multiple variations, but when provided with many alternatives, creators struggle to compare them to find the best fit. We propose VideoDiff, an AI video editing tool designed for editing with alternatives. With VideoDiff, creators can generate and review multiple AI recommendations for each editing process: creating a rough cut, inserting B-rolls, and adding text effects. VideoDiff simplifies comparisons by aligning videos and highlighting differences through timelines, transcripts, and video previews. Creators have the flexibility to regenerate and refine AI suggestions as they compare alternatives. Our study participants (N=12) could easily compare and customize alternatives, creating more satisfying results.

Paper Structure

This paper contains 25 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overview of VideoDiff: Users can view an outline of the variations in the current editing stage (a). In this figure, we see 10 rough cut variations. The user can play videos of these different versions (b) and compare them in the transcript or timeline view (c). Users can toggle between the edited and source timelines (d) to align videos to the source or edited context or click on each section to navigate directly to that part of the video (e). Users can also sort variations by duration and the number of sections included, as well as pin, archive, or edit variations according to their preferences (f).
  • Figure 2: At each editing stage, VideoDiff provides glanceable timelines for users to easily compare different variations. Users can click on any section, B-roll image, or text effect to jump to that part of the video and preview the effects.
  • Figure 3: Users can switch between the edited and original source timeline in the transcript (a) and timeline (b) views. This helps users see the edits in the context of the source content and compare which sections are included at a glance. The source view (c) shows users the location of the edited content in the context of the source view.
  • Figure 4: At each editing stage, VideoDiff provides glanceable transcripts so that users can easily review and compare different variations. In rough cut transcripts, visually concrete keywords leake2020generating are emphasized in bold, allowing users to easily skim through the content of each variation. Users can click on section headings, B-roll images, or text effects to jump to that part of the video and preview the effects.
  • Figure 5: Using VideoDiff, users can edit a variation, recombine multiple variations, and generate a new variation using text prompts. For each new generation, VideoDiff summarizes the changes so that users can easily verify the result.
  • ...and 5 more figures