Table of Contents
Fetching ...

A Topology-Aware Graph Convolutional Network for Human Pose Similarity and Action Quality Assessment

Minmin Zeng

TL;DR

This work tackles fine-grained pose similarity for Action Quality Assessment by modeling the human skeleton as a topology-aware graph. It introduces GCN-PSN, a Siamese network that learns a 50-d pose embedding from a 15-keypoint skeleton via a two-layer GCN and a two-layer MLP, with similarity measured by cosine distance and mapped to a 0–100 score. Training uses a contrastive regression loss $Loss = \frac{1}{2} Y D_c^2 + \frac{1}{2} (1-Y) \max(0, m - D_c)^2$ where $D_c = 1 - \frac{F_1^* \cdot F_2^*}{\|F_1^*\|\|F_2^*\|}$ and $m=1.35$, enabling discriminative topology-aware embeddings. Experiments on AQA-7 and FineDiving show that GCN-PSN outperforms topology-agnostic baselines and is competitive with state-of-the-art video-based methods, validating the effectiveness of skeletal topology as a prior for pose similarity and action quality evaluation.

Abstract

Action Quality Assessment (AQA) requires fine-grained understanding of human motion and precise evaluation of pose similarity. This paper proposes a topology-aware Graph Convolutional Network (GCN) framework, termed GCN-PSN, which models the human skeleton as a graph to learn discriminative, topology-sensitive pose embeddings. Using a Siamese architecture trained with a contrastive regression objective, our method outperforms coordinate-based baselines and achieves competitive performance on AQA-7 and FineDiving benchmarks. Experimental results and ablation studies validate the effectiveness of leveraging skeletal topology for pose similarity and action quality assessment.

A Topology-Aware Graph Convolutional Network for Human Pose Similarity and Action Quality Assessment

TL;DR

This work tackles fine-grained pose similarity for Action Quality Assessment by modeling the human skeleton as a topology-aware graph. It introduces GCN-PSN, a Siamese network that learns a 50-d pose embedding from a 15-keypoint skeleton via a two-layer GCN and a two-layer MLP, with similarity measured by cosine distance and mapped to a 0–100 score. Training uses a contrastive regression loss where and , enabling discriminative topology-aware embeddings. Experiments on AQA-7 and FineDiving show that GCN-PSN outperforms topology-agnostic baselines and is competitive with state-of-the-art video-based methods, validating the effectiveness of skeletal topology as a prior for pose similarity and action quality evaluation.

Abstract

Action Quality Assessment (AQA) requires fine-grained understanding of human motion and precise evaluation of pose similarity. This paper proposes a topology-aware Graph Convolutional Network (GCN) framework, termed GCN-PSN, which models the human skeleton as a graph to learn discriminative, topology-sensitive pose embeddings. Using a Siamese architecture trained with a contrastive regression objective, our method outperforms coordinate-based baselines and achieves competitive performance on AQA-7 and FineDiving benchmarks. Experimental results and ablation studies validate the effectiveness of leveraging skeletal topology for pose similarity and action quality assessment.

Paper Structure

This paper contains 23 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overall pipeline of the GCN-PSN. This flowchart illustrates the complete process from image input to the final similarity score, with the core being the use of a GCN to extract topology-aware keypoint features.
  • Figure 2: The 15 human skeleton keypoints and their connections used in this paper. The numbering is: 0-Right Ankle, 1-Right Knee, 2-Right Hip, 3-Pelvis, 4-Left Hip, 5-Left Knee, 6-Left Ankle, 7-Right Wrist, 8-Right Elbow, 9-Right Shoulder, 10-Neck, 11-Left Shoulder, 12-Left Elbow, 13-Left Wrist, 14-Head.
  • Figure 3: Qualitative analysis results. Left: High similarity (score: 97.3). Center: Medium similarity (score: 72.5). Right: Low similarity (score: 11.8). The model accurately quantifies subtle differences between poses.