Table of Contents
Fetching ...

HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation

Huaxin Zhang, Xiang Wang, Xiaohao Xu, Zhiwu Qing, Changxin Gao, Nong Sang

TL;DR

HR-Pro tackles point-supervised temporal action localization by propagating reliability through a two-stage, hierarchical framework. It first builds discriminative snippet representations with an online prototype memory and a Reliability-aware Attention Block, then connects snippets to complete proposals via a point-based proposal mechanism and reliability-guided scoring/regression. The method achieves state-of-the-art performance on THUMOS14 with a mean average precision of $60.3\%$ and shows strong improvements on multiple benchmarks, demonstrating the practical value of exploiting point annotation reliability at both snippet and instance levels. This approach offers a robust, label-efficient path for precise temporal action localization in untrimmed videos.

Abstract

Point-supervised Temporal Action Localization (PSTAL) is an emerging research direction for label-efficient learning. However, current methods mainly focus on optimizing the network either at the snippet-level or the instance-level, neglecting the inherent reliability of point annotations at both levels. In this paper, we propose a Hierarchical Reliability Propagation (HR-Pro) framework, which consists of two reliability-aware stages: Snippet-level Discrimination Learning and Instance-level Completeness Learning, both stages explore the efficient propagation of high-confidence cues in point annotations. For snippet-level learning, we introduce an online-updated memory to store reliable snippet prototypes for each class. We then employ a Reliability-aware Attention Block to capture both intra-video and inter-video dependencies of snippets, resulting in more discriminative and robust snippet representation. For instance-level learning, we propose a point-based proposal generation approach as a means of connecting snippets and instances, which produces high-confidence proposals for further optimization at the instance level. Through multi-level reliability-aware learning, we obtain more reliable confidence scores and more accurate temporal boundaries of predicted proposals. Our HR-Pro achieves state-of-the-art performance on multiple challenging benchmarks, including an impressive average mAP of 60.3% on THUMOS14. Notably, our HR-Pro largely surpasses all previous point-supervised methods, and even outperforms several competitive fully supervised methods. Code will be available at https://github.com/pipixin321/HR-Pro.

HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation

TL;DR

HR-Pro tackles point-supervised temporal action localization by propagating reliability through a two-stage, hierarchical framework. It first builds discriminative snippet representations with an online prototype memory and a Reliability-aware Attention Block, then connects snippets to complete proposals via a point-based proposal mechanism and reliability-guided scoring/regression. The method achieves state-of-the-art performance on THUMOS14 with a mean average precision of and shows strong improvements on multiple benchmarks, demonstrating the practical value of exploiting point annotation reliability at both snippet and instance levels. This approach offers a robust, label-efficient path for precise temporal action localization in untrimmed videos.

Abstract

Point-supervised Temporal Action Localization (PSTAL) is an emerging research direction for label-efficient learning. However, current methods mainly focus on optimizing the network either at the snippet-level or the instance-level, neglecting the inherent reliability of point annotations at both levels. In this paper, we propose a Hierarchical Reliability Propagation (HR-Pro) framework, which consists of two reliability-aware stages: Snippet-level Discrimination Learning and Instance-level Completeness Learning, both stages explore the efficient propagation of high-confidence cues in point annotations. For snippet-level learning, we introduce an online-updated memory to store reliable snippet prototypes for each class. We then employ a Reliability-aware Attention Block to capture both intra-video and inter-video dependencies of snippets, resulting in more discriminative and robust snippet representation. For instance-level learning, we propose a point-based proposal generation approach as a means of connecting snippets and instances, which produces high-confidence proposals for further optimization at the instance level. Through multi-level reliability-aware learning, we obtain more reliable confidence scores and more accurate temporal boundaries of predicted proposals. Our HR-Pro achieves state-of-the-art performance on multiple challenging benchmarks, including an impressive average mAP of 60.3% on THUMOS14. Notably, our HR-Pro largely surpasses all previous point-supervised methods, and even outperforms several competitive fully supervised methods. Code will be available at https://github.com/pipixin321/HR-Pro.
Paper Structure (17 sections, 15 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 15 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Motivation illustration. Given the point-level annotation (in purple), we consider the action reliability prior both at the snippet level and instance proposal level to enable reliability-aware action representation learning. In specific, our insight is to propagate reliable prototypes to produce more discriminative snippet-level scores and more reliable and complete instance-level scores. Darker color (greener or more orange) indicates higher reliability. Here, a case with one action class is shown for brevity.
  • Figure 2: Overview of Hierarchical Reliability Propagation (HR-Pro). We propagate reliable prototypes during two-stage action localization learning, i.e., Snippet-level Discrimination Learning and Instance-level Completeness Learning. (1) Snippet level: we aim to obtain snippet representations with good inter-class discrimination and action-background discrimination. (2) Instance level: we aim to refine the confidence score and boundary of the coarse proposals generated from snippet-level output.
  • Figure 3: Architecture detail of Reliability-aware Attention Block (RAB). Reliable prototype memory (in green) is injected into the original snippet features (in grey) to introduce reliable cues via the attention mechanism.
  • Figure 4: Qualitative results for two action categories, GolfSwing (left) and HammerThrow (right), on THUMOS14. We compare the detection results of HR-Pro and LACP. The orange and blue bars indicate the ground truth and predicted localization results, respectively; Blue curves represent snippet-level prediction. Prediction errors are bound with red bounding boxes.
  • Figure 5: Visualization of detection results on THUMOS14 dataset before (left) and after (right) instance-level completeness learning. The x-axis and y-axis represent time and the reliability score, respectively. We observe that the discrepancy between good and bad predictions is enlarged significantly after instance-level completeness learning.
  • ...and 3 more figures