Table of Contents
Fetching ...

Shot Segmentation Based on Von Neumann Entropy for Key Frame Extraction

Xueqing Zhang, Di Fu, Naihao Liu

TL;DR

This work tackles efficient video key frame extraction by performing shot segmentation via Von Neumann entropy on a frame similarity matrix built from perceptual hashes. The first frame of each detected shot is chosen as a key frame, enabling automatic determination of the number of key frames without prior knowledge, with a practical $O(n^2)$ overall runtime (and an $O(n^3)$ option). The method leverages temporal information and an entropy-minimizing segmentation objective, solved efficiently via beam search. Experimental results on Open Video and TikTok data show superior effective information rate and lower redundancy compared to density-peak clustering, demonstrating stable performance across video lengths and content types.

Abstract

Video key frame extraction is important in various fields, such as video summary, retrieval, and compression. Therefore, we suggest a video key frame extraction algorithm based on shot segmentation using Von Neumann entropy. The segmentation of shots is achieved through the computation of Von Neumann entropy of the similarity matrix among frames within the video sequence. The initial frame of each shot is selected as key frames, which combines the temporal sequence information of frames. The experimental results show the extracted key frames can fully and accurately represent the original video content while minimizing the number of repeated frames.

Shot Segmentation Based on Von Neumann Entropy for Key Frame Extraction

TL;DR

This work tackles efficient video key frame extraction by performing shot segmentation via Von Neumann entropy on a frame similarity matrix built from perceptual hashes. The first frame of each detected shot is chosen as a key frame, enabling automatic determination of the number of key frames without prior knowledge, with a practical overall runtime (and an option). The method leverages temporal information and an entropy-minimizing segmentation objective, solved efficiently via beam search. Experimental results on Open Video and TikTok data show superior effective information rate and lower redundancy compared to density-peak clustering, demonstrating stable performance across video lengths and content types.

Abstract

Video key frame extraction is important in various fields, such as video summary, retrieval, and compression. Therefore, we suggest a video key frame extraction algorithm based on shot segmentation using Von Neumann entropy. The segmentation of shots is achieved through the computation of Von Neumann entropy of the similarity matrix among frames within the video sequence. The initial frame of each shot is selected as key frames, which combines the temporal sequence information of frames. The experimental results show the extracted key frames can fully and accurately represent the original video content while minimizing the number of repeated frames.
Paper Structure (16 sections, 10 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 10 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: The procedure of our approach.
  • Figure 2: We use a gray-scale image to represent the similarity matrix. The color closer to black indicates higher similarity between images, while the color closer to white indicates lower similarity.
  • Figure 3: The entropy change of a randomly selected test video to help analyze the stopping condition. The test video has 1,300 frames when sampled at 2 FPS.
  • Figure 4: Key frame extraction results of the "Rising Waves” fragment.
  • Figure 5: The performance of DPC and proposed algorithm in key frame extraction on different length videos.