Unsupervised Video Summarization via Iterative Training and Simplified GAN
Hanqing Li, Diego Klabjan, Jean Utke
TL;DR
This work addresses unsupervised video summarization by introducing SUM-SR, a discriminator-free model that pairs a frame selector with a reconstructor and trains them via a reconstruction-based objective $L_{recon}$ together with a sparsity term $L_{spar}$. It adds a trainable mask and explores an iterative, part-by-part training regime, along with an unsupervised model-selection framework to pick the best model without ground-truth, achieving strong performance and efficiency. Across SumMe, TVSum, and four new datasets, SUM-SR, especially in its 5-iteration variant, outperforms state-of-the-art unsupervised methods by up to about 9% on average, while reducing training time and model size by removing the discriminator. The approach demonstrates the viability of discriminator-free, iterative training for video summarization and provides practical guidance for applying the method to longer videos via sampling or shot-based processing.
Abstract
This paper introduces a new, unsupervised method for automatic video summarization using ideas from generative adversarial networks but eliminating the discriminator, having a simple loss function, and separating training of different parts of the model. An iterative training strategy is also applied by alternately training the reconstructor and the frame selector for multiple iterations. Furthermore, a trainable mask vector is added to the model in summary generation during training and evaluation. The method also includes an unsupervised model selection algorithm. Results from experiments on two public datasets (SumMe and TVSum) and four datasets we created (Soccer, LoL, MLB, and ShortMLB) demonstrate the effectiveness of each component on the model performance, particularly the iterative training strategy. Evaluations and comparisons with the state-of-the-art methods highlight the advantages of the proposed method in performance, stability, and training efficiency.
