Shot Segmentation Based on Von Neumann Entropy for Key Frame Extraction
Xueqing Zhang, Di Fu, Naihao Liu
TL;DR
This work tackles efficient video key frame extraction by performing shot segmentation via Von Neumann entropy on a frame similarity matrix built from perceptual hashes. The first frame of each detected shot is chosen as a key frame, enabling automatic determination of the number of key frames without prior knowledge, with a practical $O(n^2)$ overall runtime (and an $O(n^3)$ option). The method leverages temporal information and an entropy-minimizing segmentation objective, solved efficiently via beam search. Experimental results on Open Video and TikTok data show superior effective information rate and lower redundancy compared to density-peak clustering, demonstrating stable performance across video lengths and content types.
Abstract
Video key frame extraction is important in various fields, such as video summary, retrieval, and compression. Therefore, we suggest a video key frame extraction algorithm based on shot segmentation using Von Neumann entropy. The segmentation of shots is achieved through the computation of Von Neumann entropy of the similarity matrix among frames within the video sequence. The initial frame of each shot is selected as key frames, which combines the temporal sequence information of frames. The experimental results show the extracted key frames can fully and accurately represent the original video content while minimizing the number of repeated frames.
