Table of Contents
Fetching ...

Rethinking Pseudo-Label Guided Learning for Weakly Supervised Temporal Action Localization from the Perspective of Noise Correction

Quan Zhang, Yuxin Qi, Xi Tang, Rui Yuan, Xi Lin, Ke Zhang, Chun Yuan

TL;DR

This work addresses weakly supervised temporal action localization (WTAL) by treating pseudo-labels as noisy signals that can mislead detectors. It introduces NoCo, a two-stage noise-correction framework consisting of CALA, which refines pseudo-label boundaries using context, and an online teacher-student apparatus with Missing Instance Compensation (MIC), Ambiguous Instance Correction (AIC), and High-Quality Pseudo-Label Mining (HPM) to iteratively correct label noise and reweight supervision. The approach is decoupled from WTAL baselines, enabling plug-and-play integration, and yields state-of-the-art results on THUMOS14 and ActivityNet v1.2 with substantially faster inference by running only the student model during testing. The combination of boundary refinement, online noise correction, and adaptive loss weighting demonstrates strong generalizability across WTAL models and provides a practical path toward robust, efficient weakly supervised TAL.

Abstract

Pseudo-label learning methods have been widely applied in weakly-supervised temporal action localization. Existing works directly utilize weakly-supervised base model to generate instance-level pseudo-labels for training the fully-supervised detection head. We argue that the noise in pseudo-labels would interfere with the learning of fully-supervised detection head, leading to significant performance leakage. Issues with noisy labels include:(1) inaccurate boundary localization; (2) undetected short action clips; (3) multiple adjacent segments incorrectly detected as one segment. To target these issues, we introduce a two-stage noisy label learning strategy to harness every potential useful signal in noisy labels. First, we propose a frame-level pseudo-label generation model with a context-aware denoising algorithm to refine the boundaries. Second, we introduce an online-revised teacher-student framework with a missing instance compensation module and an ambiguous instance correction module to solve the short-action-missing and many-to-one problems. Besides, we apply a high-quality pseudo-label mining loss in our online-revised teacher-student framework to add different weights to the noisy labels to train more effectively. Our model outperforms the previous state-of-the-art method in detection accuracy and inference speed greatly upon the THUMOS14 and ActivityNet v1.2 benchmarks.

Rethinking Pseudo-Label Guided Learning for Weakly Supervised Temporal Action Localization from the Perspective of Noise Correction

TL;DR

This work addresses weakly supervised temporal action localization (WTAL) by treating pseudo-labels as noisy signals that can mislead detectors. It introduces NoCo, a two-stage noise-correction framework consisting of CALA, which refines pseudo-label boundaries using context, and an online teacher-student apparatus with Missing Instance Compensation (MIC), Ambiguous Instance Correction (AIC), and High-Quality Pseudo-Label Mining (HPM) to iteratively correct label noise and reweight supervision. The approach is decoupled from WTAL baselines, enabling plug-and-play integration, and yields state-of-the-art results on THUMOS14 and ActivityNet v1.2 with substantially faster inference by running only the student model during testing. The combination of boundary refinement, online noise correction, and adaptive loss weighting demonstrates strong generalizability across WTAL models and provides a practical path toward robust, efficient weakly supervised TAL.

Abstract

Pseudo-label learning methods have been widely applied in weakly-supervised temporal action localization. Existing works directly utilize weakly-supervised base model to generate instance-level pseudo-labels for training the fully-supervised detection head. We argue that the noise in pseudo-labels would interfere with the learning of fully-supervised detection head, leading to significant performance leakage. Issues with noisy labels include:(1) inaccurate boundary localization; (2) undetected short action clips; (3) multiple adjacent segments incorrectly detected as one segment. To target these issues, we introduce a two-stage noisy label learning strategy to harness every potential useful signal in noisy labels. First, we propose a frame-level pseudo-label generation model with a context-aware denoising algorithm to refine the boundaries. Second, we introduce an online-revised teacher-student framework with a missing instance compensation module and an ambiguous instance correction module to solve the short-action-missing and many-to-one problems. Besides, we apply a high-quality pseudo-label mining loss in our online-revised teacher-student framework to add different weights to the noisy labels to train more effectively. Our model outperforms the previous state-of-the-art method in detection accuracy and inference speed greatly upon the THUMOS14 and ActivityNet v1.2 benchmarks.
Paper Structure (20 sections, 8 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 8 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Noise correction modeling for WTAL. NoCo introduces pseudo-label noise correction modules to address the typical failures in existing WTAL methods.
  • Figure 2: (a). Framework Overview. The dark gray modules indicate the base method (eg. WTAL model and augmentation), while the others are our noise correction modules. (b). Ambiguous Instances Correction is based on teacher predictions to accurate instance boundaries through mining high IoU samples and adaptively aggregating context. It is able to solve many-to-one and in-accurate boundary position problem. (c). Missing Instance Compensation focuses on adding missing instance based on teacher predictions. It aims to solve short action missing problem. (d). High-Quality Pseudo-label Mining Loss assigns adaptive weights for each noisy action instance to mine high-quality pseudo-label.
  • Figure 3: Visualization of ground-truth and predictions.