Table of Contents
Fetching ...

What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection

Sourabh Vasant Gothe, Vibhav Agarwal, Sourav Ghosh, Jayesh Rajkumar Vachhani, Pranay Kashyap, Barath Raj Kandur Raja

TL;DR

This work presents FlowGEBD, a non-parametric, unsupervised approach for generic event boundary detection that relies solely on motion cues derived from optical flow. It introduces two algorithms, Pixel Tracking and Flow Normalization, and their ensemble with a temporal refinement step, achieving state-of-the-art unsupervised performance on Kinetics-GEBD ($F1@0.05=0.713$) and TAPOS ($F1=0.623$). The results demonstrate that motion information alone can rival supervised baselines while offering substantial computational efficiency, enabling mobile-friendly, real-time boundary detection for applications like video summarization and editing. Overall, FlowGEBD provides a lightweight alternative to deep models for GEBD, with robustness across datasets and clear avenues for further enhancements such as bidirectional processing.

Abstract

Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events. Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space. We explore two pivotal questions pertaining to GEBD: Can non-parametric algorithms outperform unsupervised neural methods? Does motion information alone suffice for high performance? This inquiry drives us to algorithmically harness motion cues for identifying generic event boundaries in videos. In this work, we propose FlowGEBD, a non-parametric, unsupervised technique for GEBD. Our approach entails two algorithms utilizing optical flow: (i) Pixel Tracking and (ii) Flow Normalization. By conducting thorough experimentation on the challenging Kinetics-GEBD and TAPOS datasets, our results establish FlowGEBD as the new state-of-the-art (SOTA) among unsupervised methods. FlowGEBD exceeds the neural models on the Kinetics-GEBD dataset by obtaining an F1@0.05 score of 0.713 with an absolute gain of 31.7% compared to the unsupervised baseline and achieves an average F1 score of 0.623 on the TAPOS validation dataset.

What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection

TL;DR

This work presents FlowGEBD, a non-parametric, unsupervised approach for generic event boundary detection that relies solely on motion cues derived from optical flow. It introduces two algorithms, Pixel Tracking and Flow Normalization, and their ensemble with a temporal refinement step, achieving state-of-the-art unsupervised performance on Kinetics-GEBD () and TAPOS (). The results demonstrate that motion information alone can rival supervised baselines while offering substantial computational efficiency, enabling mobile-friendly, real-time boundary detection for applications like video summarization and editing. Overall, FlowGEBD provides a lightweight alternative to deep models for GEBD, with robustness across datasets and clear avenues for further enhancements such as bidirectional processing.

Abstract

Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events. Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space. We explore two pivotal questions pertaining to GEBD: Can non-parametric algorithms outperform unsupervised neural methods? Does motion information alone suffice for high performance? This inquiry drives us to algorithmically harness motion cues for identifying generic event boundaries in videos. In this work, we propose FlowGEBD, a non-parametric, unsupervised technique for GEBD. Our approach entails two algorithms utilizing optical flow: (i) Pixel Tracking and (ii) Flow Normalization. By conducting thorough experimentation on the challenging Kinetics-GEBD and TAPOS datasets, our results establish FlowGEBD as the new state-of-the-art (SOTA) among unsupervised methods. FlowGEBD exceeds the neural models on the Kinetics-GEBD dataset by obtaining an F1@0.05 score of 0.713 with an absolute gain of 31.7% compared to the unsupervised baseline and achieves an average F1 score of 0.623 on the TAPOS validation dataset.
Paper Structure (34 sections, 5 equations, 6 figures, 5 tables, 3 algorithms)

This paper contains 34 sections, 5 equations, 6 figures, 5 tables, 3 algorithms.

Figures (6)

  • Figure 1: F1@0.05 scores of different methods on the Kinetics-GEBD validation dataset. Our method FlowGEBD achieves state-of-the-art results among unsupervised methods compared to non-parametric castellano2018pyscenedetect and parametric shou2021generickang2022ubocowang2021coseg benchmarks.
  • Figure 2: FlowGEBD enables applications on smartphones, like short video segment sharing, summarization, editing by identifying generic video moments
  • Figure 3: FlowGEBD accepts a video as input and predicts a set of event boundaries, $\mathcal{B}$. Visual representation of patches with $n_w=n_h=4$ (right). $\Box$: Base patches, $\Box$: Centroidal
  • Figure 4: Pixel Tracking: Visual representation of $3\times3$ patchwise pixel tracking along temporal dimension ($\theta_ 1=0.4$)
  • Figure 5: Flow Normalization: Visual representation of normalized $3\times3$ patchwise max flow along temporal dimension ($\theta_2 = 0.25$)
  • ...and 1 more figures