Table of Contents
Fetching ...

FineCausal: A Causal-Based Framework for Interpretable Fine-Grained Action Quality Assessment

Ruisheng Han, Kanglei Zhou, Amir Atapour-Abarghouei, Xiaohui Liang, Hubert P. H. Shum

TL;DR

FineCausal addresses interpretability and robustness in Action Quality Assessment (AQA) by integrating a Graph Attention Network–based causal intervention with a Temporal Causal Attention module to produce fine-grained spatio-temporal representations. It explicitly models causal pathways among original, fused, and stage features to mitigate background confounds, while providing interpretable feedback on stage influence. On the FineDiving-HM dataset, it achieves state-of-the-art performance with a Spearman's ρ of $0.9447$ and an $R_{ ext{ell}_2}$ of $0.2338$, and offers revealing attention maps for both spatial and temporal cues. Limitations include reliance on expert-defined causal structures and high-quality annotations, suggesting future work in semi-supervised or automated annotation techniques to broaden applicability.

Abstract

Action quality assessment (AQA) is critical for evaluating athletic performance, informing training strategies, and ensuring safety in competitive sports. However, existing deep learning approaches often operate as black boxes and are vulnerable to spurious correlations, limiting both their reliability and interpretability. In this paper, we introduce FineCausal, a novel causal-based framework that achieves state-of-the-art performance on the FineDiving-HM dataset. Our approach leverages a Graph Attention Network-based causal intervention module to disentangle human-centric foreground cues from background confounders, and incorporates a temporal causal attention module to capture fine-grained temporal dependencies across action stages. This dual-module strategy enables FineCausal to generate detailed spatio-temporal representations that not only achieve state-of-the-art scoring performance but also provide transparent, interpretable feedback on which features drive the assessment. Despite its strong performance, FineCausal requires extensive expert knowledge to define causal structures and depends on high-quality annotations, challenges that we discuss and address as future research directions. Code is available at https://github.com/Harrison21/FineCausal.

FineCausal: A Causal-Based Framework for Interpretable Fine-Grained Action Quality Assessment

TL;DR

FineCausal addresses interpretability and robustness in Action Quality Assessment (AQA) by integrating a Graph Attention Network–based causal intervention with a Temporal Causal Attention module to produce fine-grained spatio-temporal representations. It explicitly models causal pathways among original, fused, and stage features to mitigate background confounds, while providing interpretable feedback on stage influence. On the FineDiving-HM dataset, it achieves state-of-the-art performance with a Spearman's ρ of and an of , and offers revealing attention maps for both spatial and temporal cues. Limitations include reliance on expert-defined causal structures and high-quality annotations, suggesting future work in semi-supervised or automated annotation techniques to broaden applicability.

Abstract

Action quality assessment (AQA) is critical for evaluating athletic performance, informing training strategies, and ensuring safety in competitive sports. However, existing deep learning approaches often operate as black boxes and are vulnerable to spurious correlations, limiting both their reliability and interpretability. In this paper, we introduce FineCausal, a novel causal-based framework that achieves state-of-the-art performance on the FineDiving-HM dataset. Our approach leverages a Graph Attention Network-based causal intervention module to disentangle human-centric foreground cues from background confounders, and incorporates a temporal causal attention module to capture fine-grained temporal dependencies across action stages. This dual-module strategy enables FineCausal to generate detailed spatio-temporal representations that not only achieve state-of-the-art scoring performance but also provide transparent, interpretable feedback on which features drive the assessment. Despite its strong performance, FineCausal requires extensive expert knowledge to define causal structures and depends on high-quality annotations, challenges that we discuss and address as future research directions. Code is available at https://github.com/Harrison21/FineCausal.

Paper Structure

This paper contains 20 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The architecture of FineCausal. The model takes a query and an exemplar video as input, extracting video and mask features through an I3D backbone. These features are fused and refined using a GAT-based causal intervention module to remove spurious correlations, producing deconfounded features. The refined features are then processed through a temporal causal attention mechanism, which decomposes the action into forward, twist, and entry stages. A regressor aggregates stage-wise contributions to predict the query action score $Y_{\text{Query}}$, adjusted based on the exemplar score $Y_{\text{Exemplar}}$, ensuring robust AQA.
  • Figure 2: The causal graph of our AQA framework. Nodes represent variables: $\mathbf{O}$ for original video features, $\mathbf{F}$ for fused video features, $\mathbf{S}$ for stage features, and $\mathbf{Y}$ for final action score. Solid arrows indicate true causal relationships, whereas dashed arrows represent spurious correlations.
  • Figure 3: Illustration of stage-wise decomposition in action sequences. The movement is split into three stages: forward, twist, and entry. Temporal causal attention models the influence of each stage on the next.
  • Figure 4: Visualization of attention mechanisms in our framework. (a) GAT attention weights between original and fused video features. (b) Temporal attention weights across different sub-action phases.
  • Figure 5: A failure case in which the athlete scores 0 due to an initial mistake in the Forward phase, negatively influencing subsequent Twist and Entry stages. The temporal causal attention highlights how an early misstep can propagate across phases, underscoring the importance of capturing causal dependencies in complex action sequences.