What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection

Sourabh Vasant Gothe; Vibhav Agarwal; Sourav Ghosh; Jayesh Rajkumar Vachhani; Pranay Kashyap; Barath Raj Kandur Raja

What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection

Sourabh Vasant Gothe, Vibhav Agarwal, Sourav Ghosh, Jayesh Rajkumar Vachhani, Pranay Kashyap, Barath Raj Kandur Raja

TL;DR

This work presents FlowGEBD, a non-parametric, unsupervised approach for generic event boundary detection that relies solely on motion cues derived from optical flow. It introduces two algorithms, Pixel Tracking and Flow Normalization, and their ensemble with a temporal refinement step, achieving state-of-the-art unsupervised performance on Kinetics-GEBD ($F1@0.05=0.713$) and TAPOS ($F1=0.623$). The results demonstrate that motion information alone can rival supervised baselines while offering substantial computational efficiency, enabling mobile-friendly, real-time boundary detection for applications like video summarization and editing. Overall, FlowGEBD provides a lightweight alternative to deep models for GEBD, with robustness across datasets and clear avenues for further enhancements such as bidirectional processing.

Abstract

Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events. Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space. We explore two pivotal questions pertaining to GEBD: Can non-parametric algorithms outperform unsupervised neural methods? Does motion information alone suffice for high performance? This inquiry drives us to algorithmically harness motion cues for identifying generic event boundaries in videos. In this work, we propose FlowGEBD, a non-parametric, unsupervised technique for GEBD. Our approach entails two algorithms utilizing optical flow: (i) Pixel Tracking and (ii) Flow Normalization. By conducting thorough experimentation on the challenging Kinetics-GEBD and TAPOS datasets, our results establish FlowGEBD as the new state-of-the-art (SOTA) among unsupervised methods. FlowGEBD exceeds the neural models on the Kinetics-GEBD dataset by obtaining an F1@0.05 score of 0.713 with an absolute gain of 31.7% compared to the unsupervised baseline and achieves an average F1 score of 0.623 on the TAPOS validation dataset.

What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection

TL;DR

) and TAPOS (

). The results demonstrate that motion information alone can rival supervised baselines while offering substantial computational efficiency, enabling mobile-friendly, real-time boundary detection for applications like video summarization and editing. Overall, FlowGEBD provides a lightweight alternative to deep models for GEBD, with robustness across datasets and clear avenues for further enhancements such as bidirectional processing.

Abstract

Paper Structure (34 sections, 5 equations, 6 figures, 5 tables, 3 algorithms)

This paper contains 34 sections, 5 equations, 6 figures, 5 tables, 3 algorithms.

Introduction
Related Work
Generic Event Boundary Detection
Learning motion and visual correspondences
Proposed Methodology
FlowGEBD with Pixel Tracking (PT)
Framewise mode
Method.
Can we improve further?
Patchwise mode
Method.
FlowGEBD with Optical Flow Normalization
Framewise mode
Patchwise mode
Method.
...and 19 more sections

Figures (6)

Figure 1: F1@0.05 scores of different methods on the Kinetics-GEBD validation dataset. Our method FlowGEBD achieves state-of-the-art results among unsupervised methods compared to non-parametric castellano2018pyscenedetect and parametric shou2021generickang2022ubocowang2021coseg benchmarks.
Figure 2: FlowGEBD enables applications on smartphones, like short video segment sharing, summarization, editing by identifying generic video moments
Figure 3: FlowGEBD accepts a video as input and predicts a set of event boundaries, $\mathcal{B}$. Visual representation of patches with $n_w=n_h=4$ (right). $\Box$: Base patches, $\Box$: Centroidal
Figure 4: Pixel Tracking: Visual representation of $3\times3$ patchwise pixel tracking along temporal dimension ($\theta_ 1=0.4$)
Figure 5: Flow Normalization: Visual representation of normalized $3\times3$ patchwise max flow along temporal dimension ($\theta_2 = 0.25$)
...and 1 more figures

What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection

TL;DR

Abstract

What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)