Table of Contents
Fetching ...

Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization

Kanglei Zhou, Qingyi Pan, Xingxing Zhang, Hubert P. H. Shum, Frederick W. B. Li, Xiaohui Liang, Liyuan Wang

TL;DR

This paper formulates Continual AQA (CAQA) to address non-stationary score distributions in action quality assessment and shows that full-parameter fine-tuning (FPFT) is necessary but prone to overfitting and feature-manifold drift. It introduces MAGR++, a principled CL framework combining layer-adaptive FPFT, a Manifold Projector, and an Intra-Inter-Joint Graph Regularizer to enable robust continual regression with feature replay. The authors provide a theoretical forgetting bound and demonstrate state-of-the-art performance on four CAQA benchmarks across three datasets, with offline SRCC gains of about 3.6% and online gains of about 12.2% on average. The work offers practical, memory-efficient strategies for adapting AQA models to evolving distributions and lays groundwork for broader continual learning in fine-grained video understanding, including potential multi-modal extensions and real-time deployment.

Abstract

Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation. A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios, which limits the generalization ability of conventional methods. We introduce Continual AQA (CAQA), which equips AQA with Continual Learning (CL) capabilities to handle evolving distributions while mitigating catastrophic forgetting. Although parameter-efficient fine-tuning of pretrained models has shown promise in CL for image classification, we find it insufficient for CAQA. Our empirical and theoretical analyses reveal two insights: (i) Full-Parameter Fine-Tuning (FPFT) is necessary for effective representation learning; yet (ii) uncontrolled FPFT induces overfitting and feature manifold shift, thereby aggravating forgetting. To address this, we propose Adaptive Manifold-Aligned Graph Regularization (MAGR++), which couples backbone fine-tuning that stabilizes shallow layers while adapting deeper ones with a two-step feature rectification pipeline: a manifold projector to translate deviated historical features into the current representation space, and a graph regularizer to align local and global distributions. We construct four CAQA benchmarks from three datasets with tailored evaluation protocols and strong baselines, enabling systematic cross-dataset comparison. Extensive experiments show that MAGR++ achieves state-of-the-art performance, with average correlation gains of 3.6% offline and 12.2% online over the strongest baseline, confirming its robustness and effectiveness. Our code is available at https://github.com/ZhouKanglei/MAGRPP.

Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization

TL;DR

This paper formulates Continual AQA (CAQA) to address non-stationary score distributions in action quality assessment and shows that full-parameter fine-tuning (FPFT) is necessary but prone to overfitting and feature-manifold drift. It introduces MAGR++, a principled CL framework combining layer-adaptive FPFT, a Manifold Projector, and an Intra-Inter-Joint Graph Regularizer to enable robust continual regression with feature replay. The authors provide a theoretical forgetting bound and demonstrate state-of-the-art performance on four CAQA benchmarks across three datasets, with offline SRCC gains of about 3.6% and online gains of about 12.2% on average. The work offers practical, memory-efficient strategies for adapting AQA models to evolving distributions and lays groundwork for broader continual learning in fine-grained video understanding, including potential multi-modal extensions and real-time deployment.

Abstract

Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation. A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios, which limits the generalization ability of conventional methods. We introduce Continual AQA (CAQA), which equips AQA with Continual Learning (CL) capabilities to handle evolving distributions while mitigating catastrophic forgetting. Although parameter-efficient fine-tuning of pretrained models has shown promise in CL for image classification, we find it insufficient for CAQA. Our empirical and theoretical analyses reveal two insights: (i) Full-Parameter Fine-Tuning (FPFT) is necessary for effective representation learning; yet (ii) uncontrolled FPFT induces overfitting and feature manifold shift, thereby aggravating forgetting. To address this, we propose Adaptive Manifold-Aligned Graph Regularization (MAGR++), which couples backbone fine-tuning that stabilizes shallow layers while adapting deeper ones with a two-step feature rectification pipeline: a manifold projector to translate deviated historical features into the current representation space, and a graph regularizer to align local and global distributions. We construct four CAQA benchmarks from three datasets with tailored evaluation protocols and strong baselines, enabling systematic cross-dataset comparison. Extensive experiments show that MAGR++ achieves state-of-the-art performance, with average correlation gains of 3.6% offline and 12.2% online over the strongest baseline, confirming its robustness and effectiveness. Our code is available at https://github.com/ZhouKanglei/MAGRPP.

Paper Structure

This paper contains 24 sections, 2 theorems, 35 equations, 16 figures, 17 tables, 3 algorithms.

Key Result

Theorem 1

Let the upstream model $\phi_{\bm{\theta}_{\text{up}}}$ denote a pre-trained AQA scorer on a source domain $\mathcal{D}_{\text{up}}$ (typically large-scale action recognition), and define the downstream task on a target AQA domain $\mathcal{D}_{\text{down}}$ with a distinct data distribution. The go Under these assumptions, for any PEFT parameter $\bm{\alpha}$ (such that $\bm{v}=\mathbf{U}\bm{\alp

Figures (16)

  • Figure 1: Motivation and challenges of CAQA. \ref{['fig:teaser-a']} and \ref{['fig:teaser-b']} illustrate the inherent limitations of conventional AQA methods, while \ref{['fig:teaser-c']} and \ref{['fig:teaser-d']} demonstrate that even strong CL baselines exhibit large performance gaps on CAQA benchmarks in both offline and online settings.
  • Figure 2: SRCC and rMSE comparison of fixed backbone, PEFT (I3D-Adapters), and FPFT in representative AQA tasks.
  • Figure 3: PEFT works well when upstream models are strong and downstream tasks are simple. FPFT does the opposite, as in AQA.
  • Figure 4: Core idea of MAGR++: (a) Old features (blue circles) deviate from the current manifold (orange curve) due to manifold shift; (b) Mixing old and new features (green circles) leads to confusion in score regression; (c) The manifold projector translates old features from the previous manifold (yellow curve) to the current one; (d) The feature space is further aligned with the quality score space.
  • Figure 5: Overview of MAGR++. At the end of session $t-1$\ref{['fig:framework-a']}, representative features are selected via Ordered Uniform Sampling (OUS, \ref{['fig:framework-b']}) and stored in the memory bank $\mathcal{M}$. At the start of session $t$\ref{['fig:framework-c']}, the backbone is adapted with layer-adaptive FPFT \ref{['fig:framework-d']} to balance stability and plasticity. A Manifold Projector (MP) is then trained \ref{['fig:framework-e']} to align old features with the evolving feature space \ref{['fig:framework-f']}, enabling effective replay and regressor adaptation \ref{['fig:framework-g']}. Finally, the memory bank is refreshed with rectified old features and newly sampled prototypes \ref{['fig:framework-h']}.
  • ...and 11 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Proof 1
  • Proof 2