A Topology-Aware Graph Convolutional Network for Human Pose Similarity and Action Quality Assessment
Minmin Zeng
TL;DR
This work tackles fine-grained pose similarity for Action Quality Assessment by modeling the human skeleton as a topology-aware graph. It introduces GCN-PSN, a Siamese network that learns a 50-d pose embedding from a 15-keypoint skeleton via a two-layer GCN and a two-layer MLP, with similarity measured by cosine distance and mapped to a 0–100 score. Training uses a contrastive regression loss $Loss = \frac{1}{2} Y D_c^2 + \frac{1}{2} (1-Y) \max(0, m - D_c)^2$ where $D_c = 1 - \frac{F_1^* \cdot F_2^*}{\|F_1^*\|\|F_2^*\|}$ and $m=1.35$, enabling discriminative topology-aware embeddings. Experiments on AQA-7 and FineDiving show that GCN-PSN outperforms topology-agnostic baselines and is competitive with state-of-the-art video-based methods, validating the effectiveness of skeletal topology as a prior for pose similarity and action quality evaluation.
Abstract
Action Quality Assessment (AQA) requires fine-grained understanding of human motion and precise evaluation of pose similarity. This paper proposes a topology-aware Graph Convolutional Network (GCN) framework, termed GCN-PSN, which models the human skeleton as a graph to learn discriminative, topology-sensitive pose embeddings. Using a Siamese architecture trained with a contrastive regression objective, our method outperforms coordinate-based baselines and achieves competitive performance on AQA-7 and FineDiving benchmarks. Experimental results and ablation studies validate the effectiveness of leveraging skeletal topology for pose similarity and action quality assessment.
