Fine-grained Dynamic Network for Generic Event Boundary Detection
Ziwei Zheng, Lijun He, Le Yang, Fan Li
TL;DR
This work tackles generic event boundary detection (GEBD) by introducing DyBDet, a dynamic network that allocates subnet processing to video snippets based on boundary characteristics. It combines a multi-exit backbone with a multi-order difference detector (MDE) and a pairwise contrast module (PCM) to capture both simple, low-level changes and complex, high-level dynamics, using local windows and soft-label training for robustness. Empirical results on Kinetics-GEBD and TAPOS show state-of-the-art performance with substantial efficiency gains due to adaptive inference and partial exits, outperforming prior methods across Rel.Dis thresholds and reducing computational cost. The approach demonstrates strong generalization, interpretability via pairwise similarity maps, and potential applicability to broader temporal localization tasks. Overall, DyBDet advances GEBD by enabling fine-grained, efficient boundary detection that adapts to the inherent diversity of event boundaries in long-form video.
Abstract
Generic event boundary detection (GEBD) aims at pinpointing event boundaries naturally perceived by humans, playing a crucial role in understanding long-form videos. Given the diverse nature of generic boundaries, spanning different video appearances, objects, and actions, this task remains challenging. Existing methods usually detect various boundaries by the same protocol, regardless of their distinctive characteristics and detection difficulties, resulting in suboptimal performance. Intuitively, a more intelligent and reasonable way is to adaptively detect boundaries by considering their special properties. In light of this, we propose a novel dynamic pipeline for generic event boundaries named DyBDet. By introducing a multi-exit network architecture, DyBDet automatically learns the subnet allocation to different video snippets, enabling fine-grained detection for various boundaries. Besides, a multi-order difference detector is also proposed to ensure generic boundaries can be effectively identified and adaptively processed. Extensive experiments on the challenging Kinetics-GEBD and TAPOS datasets demonstrate that adopting the dynamic strategy significantly benefits GEBD tasks, leading to obvious improvements in both performance and efficiency compared to the current state-of-the-art.
