Table of Contents
Fetching ...

Videogenic: Identifying Highlight Moments in Videos with Professional Photographs as a Prior

David Chuan-En Lin, Fabian Caba Heilbron, Joon-Young Lee, Oliver Wang, Nikolas Martelaro

TL;DR

This paper presents Videogenic, a technique capable of creating domain-specific highlight videos for a diverse range of domains and shows that a high-quality photograph collection combined with CLIP-based retrieval can serve as an excellent prior for finding video highlights.

Abstract

This paper investigates the challenge of extracting highlight moments from videos. To perform this task, we need to understand what constitutes a highlight for arbitrary video domains while at the same time being able to scale across different domains. Our key insight is that photographs taken by photographers tend to capture the most remarkable or photogenic moments of an activity. Drawing on this insight, we present Videogenic, a technique capable of creating domain-specific highlight videos for a diverse range of domains. In a human evaluation study (N=50), we show that a high-quality photograph collection combined with CLIP-based retrieval (which uses a neural network with semantic knowledge of images) can serve as an excellent prior for finding video highlights. In a within-subjects expert study (N=12), we demonstrate the usefulness of Videogenic in helping video editors create highlight videos with lighter workload, shorter task completion time, and better usability.

Videogenic: Identifying Highlight Moments in Videos with Professional Photographs as a Prior

TL;DR

This paper presents Videogenic, a technique capable of creating domain-specific highlight videos for a diverse range of domains and shows that a high-quality photograph collection combined with CLIP-based retrieval can serve as an excellent prior for finding video highlights.

Abstract

This paper investigates the challenge of extracting highlight moments from videos. To perform this task, we need to understand what constitutes a highlight for arbitrary video domains while at the same time being able to scale across different domains. Our key insight is that photographs taken by photographers tend to capture the most remarkable or photogenic moments of an activity. Drawing on this insight, we present Videogenic, a technique capable of creating domain-specific highlight videos for a diverse range of domains. In a human evaluation study (N=50), we show that a high-quality photograph collection combined with CLIP-based retrieval (which uses a neural network with semantic knowledge of images) can serve as an excellent prior for finding video highlights. In a within-subjects expert study (N=12), we demonstrate the usefulness of Videogenic in helping video editors create highlight videos with lighter workload, shorter task completion time, and better usability.
Paper Structure (42 sections, 1 equation, 14 figures, 1 table)

This paper contains 42 sections, 1 equation, 14 figures, 1 table.

Figures (14)

  • Figure 1: Automatic classifier. Given the frames of a video and a database of activity labels, Videogenic performs pairwise comparisons to predict the primary activity of the video.
  • Figure 2: Computing highlight scores. Given an activity label (e.g., skydiving), Videogenic retrieves 10 stock photographs and computes the average photograph representation. Given each frame of a video and the average photograph, Videogenic performs pairwise comparisons to predict a highlight score for each frame.
  • Figure 3: Highlight graph. The highlight graph visualizes the distribution of predicted highlight scores across the video (a). The user may scrub through the graph to inspect a corresponding video frame and its highlight score (b).
  • Figure 4: Example video frames and their corresponding highlight scores within a long skydiving video, using the keyword skydiving. The top-left corner displays the photograph collection used by Videogenic.
  • Figure 5: Example video frames and their corresponding highlight scores within a long skydiving video, using the keyword skydiving landing. The top-right corner displays the photograph collection used by Videogenic.
  • ...and 9 more figures