Table of Contents
Fetching ...

Using Saliency and Cropping to Improve Video Memorability

Vaibhav Mudgal, Qingyang Wang, Lorin Sweeney, Alan F. Smeaton

TL;DR

This paper investigates whether saliency-guided frame cropping can actively enhance video memorability for short clips. It combines a CLIP-based memorability predictor with three saliency-based cropping strategies, using DeepGaze IIE to produce saliency maps and a Bayesian Ridge Regressor to score memorability, evaluated on a 1,500-video subset of the Memento10k dataset. The key finding is that cropping can improve memorability primarily for videos with low initial memorability, with fixed and variable saliency tracking offering similar benefits and diminishing returns for highly memorable videos. The work demonstrates a practical, lightweight approach to manipulating video memorability and points to future directions in more sophisticated visual manipulations to further boost memorability, especially for high-memorable content.

Abstract

Video memorability is a measure of how likely a particular video is to be remembered by a viewer when that viewer has no emotional connection with the video content. It is an important characteristic as videos that are more memorable are more likely to be shared, viewed, and discussed. This paper presents results of a series of experiments where we improved the memorability of a video by selectively cropping frames based on image saliency. We present results of a basic fixed cropping as well as the results from dynamic cropping where both the size of the crop and the position of the crop within the frame, move as the video is played and saliency is tracked. Our results indicate that especially for videos of low initial memorability, the memorability score can be improved.

Using Saliency and Cropping to Improve Video Memorability

TL;DR

This paper investigates whether saliency-guided frame cropping can actively enhance video memorability for short clips. It combines a CLIP-based memorability predictor with three saliency-based cropping strategies, using DeepGaze IIE to produce saliency maps and a Bayesian Ridge Regressor to score memorability, evaluated on a 1,500-video subset of the Memento10k dataset. The key finding is that cropping can improve memorability primarily for videos with low initial memorability, with fixed and variable saliency tracking offering similar benefits and diminishing returns for highly memorable videos. The work demonstrates a practical, lightweight approach to manipulating video memorability and points to future directions in more sophisticated visual manipulations to further boost memorability, especially for high-memorable content.

Abstract

Video memorability is a measure of how likely a particular video is to be remembered by a viewer when that viewer has no emotional connection with the video content. It is an important characteristic as videos that are more memorable are more likely to be shared, viewed, and discussed. This paper presents results of a series of experiments where we improved the memorability of a video by selectively cropping frames based on image saliency. We present results of a basic fixed cropping as well as the results from dynamic cropping where both the size of the crop and the position of the crop within the frame, move as the video is played and saliency is tracked. Our results indicate that especially for videos of low initial memorability, the memorability score can be improved.
Paper Structure (15 sections, 2 equations, 7 figures, 1 table)

This paper contains 15 sections, 2 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Memorability scores calculated from sweeney2021predicting vs. manually determined memorability scores in the Memento10k dataset provided for 1,500 test videos
  • Figure 2: Sample frames from 3 videos, memorability scores for those frames, and average memorability scores for the videos.
  • Figure 3: Video frame (top left) and its generated saliency map with the centre point of the saliency spread marked as a point (top right). The image on the left also shows the saliency map at two different thresholds. The graph on the right shows the movement of the centrepoint of saliency for the duration of the video.
  • Figure 4: Sample frames for fixed-sized (second approach) and for variable-sized cropping (third approach) with saliency tracking.
  • Figure 5: Changes in predicted video memorability as a result of varying crop sizes where a crop size of 90% means discarding 10% of the frame
  • ...and 2 more figures