Table of Contents
Fetching ...

RICA2: Rubric-Informed, Calibrated Assessment of Actions

Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV, Yin Li

TL;DR

RICA^2 tackles action quality assessment by integrating a human scoring rubric with a graph-structured representation and modeling prediction uncertainty through stochastic step embeddings. The method encodes action steps and rubric into a DAG, propagates probabilistic leaf embeddings on this graph via a GNN, and decodes a final score from a root node, all trained under a variational information bottleneck objective to balance predictive accuracy and embedding complexity. Empirically, RICA^2 achieves state-of-the-art performance on FineDiving, MTL-AQA, and JIGSAWS, while delivering well-calibrated uncertainty estimates and an exemplar-free inference pipeline. The approach offers a principled path toward trustworthy AQA in high-stakes domains such as sports analytics and surgical education, enabling selective human review for uncertain cases and providing interpretable rubric-aligned reasoning. The combination of rubric-informed graph modeling, stochastic embeddings, and VIB-based training constitutes a robust framework for reliable, interpretable action assessment in video data.

Abstract

The ability to quantify how well an action is carried out, also known as action quality assessment (AQA), has attracted recent interest in the vision community. Unfortunately, prior methods often ignore the score rubric used by human experts and fall short of quantifying the uncertainty of the model prediction. To bridge the gap, we present RICA^2 - a deep probabilistic model that integrates score rubric and accounts for prediction uncertainty for AQA. Central to our method lies in stochastic embeddings of action steps, defined on a graph structure that encodes the score rubric. The embeddings spread probabilistic density in the latent space and allow our method to represent model uncertainty. The graph encodes the scoring criteria, based on which the quality scores can be decoded. We demonstrate that our method establishes new state of the art on public benchmarks, including FineDiving, MTL-AQA, and JIGSAWS, with superior performance in score prediction and uncertainty calibration. Our code is available at https://abrarmajeedi.github.io/rica2_aqa/

RICA2: Rubric-Informed, Calibrated Assessment of Actions

TL;DR

RICA^2 tackles action quality assessment by integrating a human scoring rubric with a graph-structured representation and modeling prediction uncertainty through stochastic step embeddings. The method encodes action steps and rubric into a DAG, propagates probabilistic leaf embeddings on this graph via a GNN, and decodes a final score from a root node, all trained under a variational information bottleneck objective to balance predictive accuracy and embedding complexity. Empirically, RICA^2 achieves state-of-the-art performance on FineDiving, MTL-AQA, and JIGSAWS, while delivering well-calibrated uncertainty estimates and an exemplar-free inference pipeline. The approach offers a principled path toward trustworthy AQA in high-stakes domains such as sports analytics and surgical education, enabling selective human review for uncertain cases and providing interpretable rubric-aligned reasoning. The combination of rubric-informed graph modeling, stochastic embeddings, and VIB-based training constitutes a robust framework for reliable, interpretable action assessment in video data.

Abstract

The ability to quantify how well an action is carried out, also known as action quality assessment (AQA), has attracted recent interest in the vision community. Unfortunately, prior methods often ignore the score rubric used by human experts and fall short of quantifying the uncertainty of the model prediction. To bridge the gap, we present RICA^2 - a deep probabilistic model that integrates score rubric and accounts for prediction uncertainty for AQA. Central to our method lies in stochastic embeddings of action steps, defined on a graph structure that encodes the score rubric. The embeddings spread probabilistic density in the latent space and allow our method to represent model uncertainty. The graph encodes the scoring criteria, based on which the quality scores can be decoded. We demonstrate that our method establishes new state of the art on public benchmarks, including FineDiving, MTL-AQA, and JIGSAWS, with superior performance in score prediction and uncertainty calibration. Our code is available at https://abrarmajeedi.github.io/rica2_aqa/
Paper Structure (31 sections, 12 equations, 10 figures, 15 tables)

This paper contains 31 sections, 12 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: RICA$^2$ integrates score rubric used by human experts and accounts for prediction uncertainty, resulting in accurate predictions and calibrated uncertainty estimates.
  • Figure 1: Main results on (a) FineDiving and (b) MTL-AQA. Prediction accuracy ($SRCC$ and $R\ell_{2}$) and uncertainty calibration ($\tau$) metrics are reported. We compare our method with exemplar-based and exemplar-free baselines.
  • Figure 2: Overview of RICA$^2$. Leveraging scoring rubrics (a), RICA$^2$ integrates a graph representation of action step and rubric with uncertainty modeling (b). Specifically, RICA$^2$ takes an input of the video and its key action steps, encodes the input into embeddings (c), refines the embeddings through a deep probabilistic model, and outputs an action score in tandem with its uncertainty estimate.
  • Figure 3: Results on FineDiving
  • Figure 4: Results on MTL-AQA
  • ...and 5 more figures