Uncertainty-Driven Action Quality Assessment

Caixia Zhou; Yaping Huang

Uncertainty-Driven Action Quality Assessment

Caixia Zhou, Yaping Huang

TL;DR

UD-AQA tackles the subjectivity in expert action quality scoring by introducing a $CVAE$-based latent branch that models uncertainty and enables sampling of multiple plausible scores. The approach combines a deterministic video-feature branch with a latent uncertainty branch, and uses estimated uncertainty to re-weight the regression loss and to guide a curriculum-like training order. Empirical results on MTL-AQA, FineDiving, and JIGSAWS demonstrate competitive or state-of-the-art performance, with notable improvements in ranking metrics and robustness, highlighting the practical value of incorporating judge variability into AQA. The method offers scalable generation of multiple scores for variable judge counts and provides uncertainty estimates that reflect confidence in predictions, facilitating more reliable decision-making in real-world applications.

Abstract

Automatic action quality assessment (AQA) has attracted increasing attention due to its wide applications. However, most existing AQA methods employ deterministic models to predict the final score for each action, while overlooking the subjectivity and diversity among expert judges during the scoring process. In this paper, we propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to utilize and capture the diversity among multiple judge scores. Specifically, we design a Conditional Variational Auto-Encoder (CVAE)-based module to encode the uncertainty in expert assessment, where multiple judge scores can be produced by sampling latent features from the learned latent space multiple times. To further utilize the uncertainty, we generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss, effectively reducing the influence of uncertain samples during training. Moreover, we further design an uncertainty-guided training strategy to dynamically adjust the learning order of the samples from low uncertainty to high uncertainty. The experiments show that our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.

Uncertainty-Driven Action Quality Assessment

TL;DR

UD-AQA tackles the subjectivity in expert action quality scoring by introducing a

-based latent branch that models uncertainty and enables sampling of multiple plausible scores. The approach combines a deterministic video-feature branch with a latent uncertainty branch, and uses estimated uncertainty to re-weight the regression loss and to guide a curriculum-like training order. Empirical results on MTL-AQA, FineDiving, and JIGSAWS demonstrate competitive or state-of-the-art performance, with notable improvements in ranking metrics and robustness, highlighting the practical value of incorporating judge variability into AQA. The method offers scalable generation of multiple scores for variable judge counts and provides uncertainty estimates that reflect confidence in predictions, facilitating more reliable decision-making in real-world applications.

Abstract

Paper Structure (28 sections, 10 equations, 3 figures, 6 tables)

This paper contains 28 sections, 10 equations, 3 figures, 6 tables.

Introduction
Related Work
Action Quality Assessment
Conditional Variational Auto-Encoder
Proposed Approach
Video-level Feature Extraction
CVAE-based Module
Uncertainty Estimation
Loss Function
Uncertainty-guided Training Process
Experiment
Datasets
Metrics
Implementation Details
Results on MTL-AQA Dataset
...and 13 more sections

Figures (3)

Figure 1: Previous deterministic models and our proposed probabilistic UD-AQA model. The top figure illustrates the limitations of previous deterministic models, which can only predict one deterministic result. In contrast, the bottom figure showcases our proposed UD-AQA model with a novel latent branch encoding uncertainty among judge scores. UD-AQA combines latent features and deterministic features to generate prediction. By sampling multiple times, UD-AQA can produce diverse results.
Figure 2: The overall framework of our proposed UD-AQA. The input videos are segmented into $K$ overlapping clips and fed into the I3D backbone to extract the clip-level features. Then we design a weight attention (WA) module to aggregate the clip-level features into video-level features. To model ambiguity among judges, we propose a CVAE-based module that projects each video and its corresponding judge scores into a low-dimensional latent space, allowing us to sample latent variables and produce multiple outputs. Dashed lines and boxes indicate components used only in training.
Figure 3: Effects of the dimension of latent space.

Uncertainty-Driven Action Quality Assessment

TL;DR

Abstract

Uncertainty-Driven Action Quality Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (3)