Table of Contents
Fetching ...

CaFlow: Enhancing Long-Term Action Quality Assessment with Causal Counterfactual Flow

Ruisheng Han, Kanglei Zhou, Shuang Chen, Amir Atapour-Abarghouei, Hubert P. H. Shum

TL;DR

CaFlow addresses long-term Action Quality Assessment under confounding context and extended temporal dynamics. It combines a Causal Counterfactual Regularization module to disentangle causal and confounding cues via a front-door-inspired scheme and counterfactual feature swaps, with a Bidirectional Time-conditioned Flow that enforces forward-backward cycle-consistency. The two modules produce stable, causally focused representations $H_i^1$ from initial features $H_i^0$, which are regressed to action scores. Experiments on RG, FIS-V, and LOGO show state-of-the-art performance, demonstrating robustness to context shifts and improved temporal coherence with practical applicability in sports analytics and rehabilitation.

Abstract

Action Quality Assessment (AQA) predicts fine-grained execution scores from action videos and is widely applied in sports, rehabilitation, and skill evaluation. Long-term AQA, as in figure skating or rhythmic gymnastics, is especially challenging since it requires modeling extended temporal dynamics while remaining robust to contextual confounders. Existing approaches either depend on costly annotations or rely on unidirectional temporal modeling, making them vulnerable to spurious correlations and unstable long-term representations. To this end, we propose CaFlow, a unified framework that integrates counterfactual de-confounding with bidirectional time-conditioned flow. The Causal Counterfactual Regularization (CCR) module disentangles causal and confounding features in a self-supervised manner and enforces causal robustness through counterfactual interventions, while the BiT-Flow module models forward and backward dynamics with a cycle-consistency constraint to produce smoother and more coherent representations. Extensive experiments on multiple long-term AQA benchmarks demonstrate that CaFlow achieves state-of-the-art performance. Code is available at https://github.com/Harrison21/CaFlow

CaFlow: Enhancing Long-Term Action Quality Assessment with Causal Counterfactual Flow

TL;DR

CaFlow addresses long-term Action Quality Assessment under confounding context and extended temporal dynamics. It combines a Causal Counterfactual Regularization module to disentangle causal and confounding cues via a front-door-inspired scheme and counterfactual feature swaps, with a Bidirectional Time-conditioned Flow that enforces forward-backward cycle-consistency. The two modules produce stable, causally focused representations from initial features , which are regressed to action scores. Experiments on RG, FIS-V, and LOGO show state-of-the-art performance, demonstrating robustness to context shifts and improved temporal coherence with practical applicability in sports analytics and rehabilitation.

Abstract

Action Quality Assessment (AQA) predicts fine-grained execution scores from action videos and is widely applied in sports, rehabilitation, and skill evaluation. Long-term AQA, as in figure skating or rhythmic gymnastics, is especially challenging since it requires modeling extended temporal dynamics while remaining robust to contextual confounders. Existing approaches either depend on costly annotations or rely on unidirectional temporal modeling, making them vulnerable to spurious correlations and unstable long-term representations. To this end, we propose CaFlow, a unified framework that integrates counterfactual de-confounding with bidirectional time-conditioned flow. The Causal Counterfactual Regularization (CCR) module disentangles causal and confounding features in a self-supervised manner and enforces causal robustness through counterfactual interventions, while the BiT-Flow module models forward and backward dynamics with a cycle-consistency constraint to produce smoother and more coherent representations. Extensive experiments on multiple long-term AQA benchmarks demonstrate that CaFlow achieves state-of-the-art performance. Code is available at https://github.com/Harrison21/CaFlow

Paper Structure

This paper contains 15 sections, 14 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Framework of CaFlow. Our method tackles confounding and domain shift in AQA with two key contributions: (1) Causal Counterfactual Regularization (CCR), which uses a Causal Feature Separator and counterfactual mixing to separate causal from confounding clips and impose a triplet‑style causal loss; (2) Bidirectional Time-conditioned Flow (BiT Flow), a time‑conditioned bidirectional flow that progressively transforms $H_i^{0}$ to the AQA‑specific representation $H_i^{1}$ with forward-backward consistency and optimal‑transport regularization. The refined representation is finally regressed by an MLP to the quality score.
  • Figure 2: The causal graph of our AQA framework. Nodes represent variables: $H_i^0$ for initial video features, $H_i^1$ for desired features, $C$ for confounder, $H_{co\_i}^0$ for confound features, $H_{ca\_i}^0$ for causal features, and $Y$ for the final action score. Solid arrows ($\rightarrow$) indicate true causal relationships, whereas dashed arrows ($\dashrightarrow$) represent spurious causal relationships.
  • Figure 3: Error analysis on RG. (a) Boxplots of absolute errors with annotated statistics (mean/median/std). (b) Cumulative error-accuracy curves with area under the curve (AUC).
  • Figure 4: Three representative routines with key frames per case.