Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism
Sangyoun Lee, Juho Jung, Changdae Oh, Sunghee Yun
TL;DR
This work targets Temporal Action Localization by addressing the limitations of conventional sequence models in capturing long-range temporal dependencies and causality. It introduces a Selective State Space Model (S6)-based TAL framework with the Feature Aggregated Bi-S6 (FA-Bi-S6) block, the Dual Bi-S6 structure, and a recurrent mechanism, enabling robust multi-scale spatiotemporal dependency modeling without increasing parameters. The approach demonstrates state-of-the-art performance across THUMOS-14, ActivityNet, FineAction, and HACS, supported by extensive ablations that validate the Stem module design and recurrence strategy. The findings highlight the potential of S6-based architectures to improve TAL by effectively integrating temporal causality and multi-scale context, guiding future exploration of state-space models in video understanding.
Abstract
Temporal Action Localization (TAL) is a critical task in video analysis, identifying precise start and end times of actions. Existing methods like CNNs, RNNs, GCNs, and Transformers have limitations in capturing long-range dependencies and temporal causality. To address these challenges, we propose a novel TAL architecture leveraging the Selective State Space Model (S6). Our approach integrates the Feature Aggregated Bi-S6 block, Dual Bi-S6 structure, and a recurrent mechanism to enhance temporal and channel-wise dependency modeling without increasing parameter complexity. Extensive experiments on benchmark datasets demonstrate state-of-the-art results with mAP scores of 74.2% on THUMOS-14, 42.9% on ActivityNet, 29.6% on FineAction, and 45.8% on HACS. Ablation studies validate our method's effectiveness, showing that the Dual structure in the Stem module and the recurrent mechanism outperform traditional approaches. Our findings demonstrate the potential of S6-based models in TAL tasks, paving the way for future research.
