Table of Contents
Fetching ...

Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment

Wulian Yun, Mengshi Qi, Fei Peng, Huadong Ma

TL;DR

The paper tackles the challenge of action quality assessment with limited labeled data by introducing a semi-supervised framework that combines a teacher, a student, and a reference network. The teacher and reference networks generate pseudo-labels for unlabeled data, which supervise the student, and a confidence memory stores high-fidelity predictions to improve reliability. Key contributions include the novel reference network for additional supervision and the memory mechanism that preserves the most trustworthy pseudo-labels, yielding significant gains over existing semi-supervised AQA methods across three benchmarks. The approach enables accurate AQA with reduced labeling effort, enhancing scalability for applications in sports scoring and surgical skill assessment.

Abstract

Existing action quality assessment (AQA) methods often require a large number of label annotations for fully supervised learning, which are laborious and expensive. In practice, the labeled data are difficult to obtain because the AQA annotation process requires domain-specific expertise. In this paper, we propose a novel semi-supervised method, which can be utilized for better assessment of the AQA task by exploiting a large amount of unlabeled data and a small portion of labeled data. Differing from the traditional teacher-student network, we propose a teacher-reference-student architecture to learn both unlabeled and labeled data, where the teacher network and the reference network are used to generate pseudo-labels for unlabeled data to supervise the student network. Specifically, the teacher predicts pseudo-labels by capturing high-level features of unlabeled data. The reference network provides adequate supervision of the student network by referring to additional action information. Moreover, we introduce confidence memory to improve the reliability of pseudo-labels by storing the most accurate ever output of the teacher network and reference network. To validate our method, we conduct extensive experiments on three AQA benchmark datasets. Experimental results show that our method achieves significant improvements and outperforms existing semi-supervised AQA methods.

Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment

TL;DR

The paper tackles the challenge of action quality assessment with limited labeled data by introducing a semi-supervised framework that combines a teacher, a student, and a reference network. The teacher and reference networks generate pseudo-labels for unlabeled data, which supervise the student, and a confidence memory stores high-fidelity predictions to improve reliability. Key contributions include the novel reference network for additional supervision and the memory mechanism that preserves the most trustworthy pseudo-labels, yielding significant gains over existing semi-supervised AQA methods across three benchmarks. The approach enables accurate AQA with reduced labeling effort, enhancing scalability for applications in sports scoring and surgical skill assessment.

Abstract

Existing action quality assessment (AQA) methods often require a large number of label annotations for fully supervised learning, which are laborious and expensive. In practice, the labeled data are difficult to obtain because the AQA annotation process requires domain-specific expertise. In this paper, we propose a novel semi-supervised method, which can be utilized for better assessment of the AQA task by exploiting a large amount of unlabeled data and a small portion of labeled data. Differing from the traditional teacher-student network, we propose a teacher-reference-student architecture to learn both unlabeled and labeled data, where the teacher network and the reference network are used to generate pseudo-labels for unlabeled data to supervise the student network. Specifically, the teacher predicts pseudo-labels by capturing high-level features of unlabeled data. The reference network provides adequate supervision of the student network by referring to additional action information. Moreover, we introduce confidence memory to improve the reliability of pseudo-labels by storing the most accurate ever output of the teacher network and reference network. To validate our method, we conduct extensive experiments on three AQA benchmark datasets. Experimental results show that our method achieves significant improvements and outperforms existing semi-supervised AQA methods.
Paper Structure (18 sections, 11 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 11 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustrations of (a) fully supervised AQA task, (b) traditional teacher-student network and (c) our semi-supervised AQA task. The fully supervised methods utilize labeled data as inputs for learning to achieve the assessment, while our semi-supervised method leverages labeled data and unlabeled data as inputs to perform the assessment. In addition, differing from the traditional teacher-student network, we have a reference network designed for the AQA task to provide supervision information for the student.
  • Figure 2: The overall framework of our method. Our method starts with a burn-in stage of training, followed by the teacher-reference-student learning stage. It mainly contains a teacher network, a student network and a reference network. The teacher network and reference network predict pseudo-labels for unlabeled data and supervise the training of the student network. The student network has the same architecture as the teacher. Moreover, confidence memory is used to ensure the reliability of pseudo-labels generated by the teacher network and reference network.
  • Figure 3: Scatter plot comparison results of score prediction between baseline and our method on MTL-AQA dataset. The black line indicates the ground truth, while the pink points represent the predicted scores.
  • Figure 4: Case study with qualitative results on four video samples from the MTL-AQA dataset, which presents the predicted scores comparisons of baseline, our method and ground truth (GT). 'Baseline' means the traditional teacher-student network.