FCA-RAC: First Cycle Annotated Repetitive Action Counting
Jiada Lu, WeiWei Zhou, Xiang Qian, Dongze Lian, Yanyu Xu, Weifeng Wang, Lina Cao, Shenghua Gao
TL;DR
Repetitive action counting suffers from limited action diversity in existing datasets, hindering generalization to unseen actions. The authors propose FCA-RAC, a four-part framework comprising First Cycle Annotated labeling, Dynamic Input Sampling, Multi-Temporal Granularity Convolution, and Training Knowledge Augmentation to exploit the relationship between the first action cycle and subsequent actions. Empirical results on RepCount-A, Countix-AV, UCFRep, and QUVA show superior MAE and OBO scores and strong generalization to unseen actions, aided by a nearest-neighbor embedding mechanism in TKA that reduces reliance on test-time adaptation. The approach delivers robust action counting across seen and unseen actions, with practical implications for real-world RAC tasks including fitness analytics and video understanding.
Abstract
Repetitive action counting quantifies the frequency of specific actions performed by individuals. However, existing action-counting datasets have limited action diversity, potentially hampering model performance on unseen actions. To address this issue, we propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC). This framework contains 4 parts: 1) a labeling technique that annotates each training video with the start and end of the first action cycle, along with the total action count. This technique enables the model to capture the correlation between the initial action cycle and subsequent actions; 2) an adaptive sampling strategy that maximizes action information retention by adjusting to the speed of the first annotated action cycle in videos; 3) a Multi-Temporal Granularity Convolution (MTGC) module, that leverages the muli-scale first action as a kernel to convolve across the entire video. This enables the model to capture action variations at different time scales within the video; 4) a strategy called Training Knowledge Augmentation (TKA) that exploits the annotated first action cycle information from the entire dataset. This allows the network to harness shared characteristics across actions effectively, thereby enhancing model performance and generalizability to unseen actions. Experimental results demonstrate that our approach achieves superior outcomes on RepCount-A and related datasets, highlighting the efficacy of our framework in improving model performance on seen and unseen actions. Our paper makes significant contributions to the field of action counting by addressing the limitations of existing datasets and proposing novel techniques for improving model generalizability.
