Table of Contents
Fetching ...

Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment

Lauren Okamoto, Paritosh Parmar

TL;DR

This paper tackles bias and explainability gaps in action quality assessment (AQA) by introducing a hierarchical neuro-symbolic framework for diving. It combines Neural Action-Context Parsing to extract interpretable symbols (platform position, poses, splash) with a Rules-based Action Analyzer that performs detailed dive recognition, temporal segmentation, and fine-grained quality assessment, culminating in a visio-linguistic report. The approach yields state-of-the-art performance in fine-grained action recognition and temporal segmentation, while providing objective, explainable scoring and supporting visuals validated by domain experts. The method is open-sourced and proposed as a generalizable blueprint for extending explainable AQA to other sports and precise actions, including surgery.

Abstract

Action quality assessment (AQA) applies computer vision to quantitatively assess the performance or execution of a human action. Current AQA approaches are end-to-end neural models, which lack transparency and tend to be biased because they are trained on subjective human judgements as ground-truth. To address these issues, we introduce a neuro-symbolic paradigm for AQA, which uses neural networks to abstract interpretable symbols from video data and makes quality assessments by applying rules to those symbols. We take diving as the case study. We found that domain experts prefer our system and find it more informative than purely neural approaches to AQA in diving. Our system also achieves state-of-the-art action recognition and temporal segmentation, and automatically generates a detailed report that breaks the dive down into its elements and provides objective scoring with visual evidence. As verified by a group of domain experts, this report may be used to assist judges in scoring, help train judges, and provide feedback to divers. Annotated training data and code: https://github.com/laurenok24/NSAQA.

Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment

TL;DR

This paper tackles bias and explainability gaps in action quality assessment (AQA) by introducing a hierarchical neuro-symbolic framework for diving. It combines Neural Action-Context Parsing to extract interpretable symbols (platform position, poses, splash) with a Rules-based Action Analyzer that performs detailed dive recognition, temporal segmentation, and fine-grained quality assessment, culminating in a visio-linguistic report. The approach yields state-of-the-art performance in fine-grained action recognition and temporal segmentation, while providing objective, explainable scoring and supporting visuals validated by domain experts. The method is open-sourced and proposed as a generalizable blueprint for extending explainable AQA to other sports and precise actions, including surgery.

Abstract

Action quality assessment (AQA) applies computer vision to quantitatively assess the performance or execution of a human action. Current AQA approaches are end-to-end neural models, which lack transparency and tend to be biased because they are trained on subjective human judgements as ground-truth. To address these issues, we introduce a neuro-symbolic paradigm for AQA, which uses neural networks to abstract interpretable symbols from video data and makes quality assessments by applying rules to those symbols. We take diving as the case study. We found that domain experts prefer our system and find it more informative than purely neural approaches to AQA in diving. Our system also achieves state-of-the-art action recognition and temporal segmentation, and automatically generates a detailed report that breaks the dive down into its elements and provides objective scoring with visual evidence. As verified by a group of domain experts, this report may be used to assist judges in scoring, help train judges, and provide feedback to divers. Annotated training data and code: https://github.com/laurenok24/NSAQA.
Paper Structure (24 sections, 3 figures, 9 tables, 1 algorithm)

This paper contains 24 sections, 3 figures, 9 tables, 1 algorithm.

Figures (3)

  • Figure 1: Neuro-Symbolic Action Quality Assessment (NS-AQA) vs Neural AQA. Our NS-AQA approach (Left) employs neural networks to extract crucial symbolic information, such as platform location, framewise pose estimation, & splash detection. These symbols furnish objective data utilized for rules-based fine-grained action recognition, temporal segmentation, & detailed error analysis. The outcome is an objective score & a comprehensive visio-linguistic report, complete with supporting visual evidence, generated programmatically. This is much more valuable than existing AQA approaches (Right) that can only predict a single score (potentially biased) without any accompanying explanation. Please zoom in; full-size version in supplementary.
  • Figure 2: Somersault and Twist Counters Visualized. A): Visualization of right-to-left hip vector. We count twists by counting the "petals" formed by this vector. B): Visualization of pelvis-to-thorax vector. We count somersaults by counting the rotations of this vector over the course of the dive. Graphical Solutions (C&D): We then show how the somersault and twist counters are applied to two different dives in C&D. The blue trace represents the vector we track, and the black circles in the twist plots represent the boundaries of when "petals" are counted. The petal must surpass the inner black circle while staying inside the outer black circle to count as a petal. Each petal is 0.5 twists. C): A forward dive with 2.5 somersaults (2.5 revolutions from the initial vector vertically up ($\uparrow$) to final vector vertically down ($\downarrow$) in the somersault plot) and 3 twists (6 “petals” in the twist plot). D): A forward dive with 3.5 somersaults (3.5 revolutions from initial vector vertically up ($\uparrow$) to final vector vertically down ($\downarrow$) in the somersault plot) in pike position and no twists (zero “petals” in the twist plot).
  • Figure 3: Temporal Segmentation Visualization. A dive is segmented into 4 phases: start/takeoff, twist, somersault, and entry.