Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization
Feixiang Zhou, Bryan Williams, Hossein Rahmani
TL;DR
This work tackles noisy pseudo labels in semi-supervised temporal action localization by introducing Adaptive Pseudo-label Learning (APL), which jointly scores pseudo-label quality through Adaptive Label Quality Assessment (ALQA) and refines selections via Instance-level Consistency Discriminator (ICD). ALQA combines localization reliability with classification confidence by predicting $P_{tiou}$ and $P_{tnd}$ and forming a joint score $\,\hat{P}=\hat{P}_{diou}\odot\hat{P}_{cls}$, using a DIoU-based soft label for classification; pseudo-labels are dynamically chosen using thresholds and Soft-NMS. ICD leverages inter-instance consistency to filter ambiguous positives and mine potential positives by computing similarity scores between predicted instances and labeled examples with a learned discriminator $\mathcal{D}$. Action-aware Contrastive Pre-training (ACP) provides unsupervised, multi-scale frame-level representations via coarse- and fine-grained contrasts to improve discrimination between actions and backgrounds and among actions. Across THUMOS14 and ActivityNet v1.3, the method achieves new state-of-the-art results under multiple labeling ratios, validating improvements in pseudo-label quality and representation learning. The combination of ALQA, ICD, and ACP offers a robust, end-to-end approach to semi-supervised temporal action localization with practical impact on reducing annotation costs while maintaining high accuracy.
Abstract
Alleviating noisy pseudo labels remains a key challenge in Semi-Supervised Temporal Action Localization (SS-TAL). Existing methods often filter pseudo labels based on strict conditions, but they typically assess classification and localization quality separately, leading to suboptimal pseudo-label ranking and selection. In particular, there might be inaccurate pseudo labels within selected positives, alongside reliable counterparts erroneously assigned to negatives. To tackle these problems, we propose a novel Adaptive Pseudo-label Learning (APL) framework to facilitate better pseudo-label selection. Specifically, to improve the ranking quality, Adaptive Label Quality Assessment (ALQA) is proposed to jointly learn classification confidence and localization reliability, followed by dynamically selecting pseudo labels based on the joint score. Additionally, we propose an Instance-level Consistency Discriminator (ICD) for eliminating ambiguous positives and mining potential positives simultaneously based on inter-instance intrinsic consistency, thereby leading to a more precise selection. We further introduce a general unsupervised Action-aware Contrastive Pre-training (ACP) to enhance the discrimination both within actions and between actions and backgrounds, which benefits SS-TAL. Extensive experiments on THUMOS14 and ActivityNet v1.3 demonstrate that our method achieves state-of-the-art performance under various semi-supervised settings.
