Table of Contents
Fetching ...

Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification

Haocong Rao, Chunyan Miao

TL;DR

This work tackles skeleton-based person re-identification by addressing the limitation of learning from all-joint relations and using only average features. It introduces MoCos, comprising a Motif Guided Graph Transformer (MGT) that uses Hierarchical Structural Motifs and Gait Collaborative Motifs to capture multi-order structural and gait-related joint relations, and Combinatorial Skeleton Prototype Learning (CSP) to form diverse sub-skeleton and sub-tracklet representations that are contrasted with identity prototypes. The approach demonstrates significant performance gains over state-of-the-art methods on multiple benchmarks and proves generality to RGB-estimated skeletons and unsupervised scenarios. Overall, MoCos advances skeleton-based re-ID by jointly modeling structure-aware and gait-aware relations and by exploiting rich combinatorial patterns for robust, discriminative representations.

Abstract

Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios. Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning. However, they rarely explore key body structure and motion such as gait to focus on more important body joints or limbs, while lacking the ability to fully mine valuable spatial-temporal sub-patterns of skeletons to enhance model learning. This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos) that exploits structure-specific and gait-related body relations as well as combinatorial features of skeleton graphs to learn effective skeleton representations for person re-ID. In particular, motivated by the locality within joints' structure and the body-component collaboration in gait, we first propose the motif guided graph transformer (MGT) that incorporates hierarchical structural motifs and gait collaborative motifs, which simultaneously focuses on multi-order local joint correlations and key cooperative body parts to enhance skeleton relation learning. Then, we devise the combinatorial skeleton prototype learning (CSP) that leverages random spatial-temporal combinations of joint nodes and skeleton graphs to generate diverse sub-skeleton and sub-tracklet representations, which are contrasted with the most representative features (prototypes) of each identity to learn class-related semantics and discriminative skeleton representations. Extensive experiments validate the superior performance of MoCos over existing state-of-the-art models. We further show its generality under RGB-estimated skeletons, different graph modeling, and unsupervised scenarios.

Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning for Skeleton-Based Person Re-Identification

TL;DR

This work tackles skeleton-based person re-identification by addressing the limitation of learning from all-joint relations and using only average features. It introduces MoCos, comprising a Motif Guided Graph Transformer (MGT) that uses Hierarchical Structural Motifs and Gait Collaborative Motifs to capture multi-order structural and gait-related joint relations, and Combinatorial Skeleton Prototype Learning (CSP) to form diverse sub-skeleton and sub-tracklet representations that are contrasted with identity prototypes. The approach demonstrates significant performance gains over state-of-the-art methods on multiple benchmarks and proves generality to RGB-estimated skeletons and unsupervised scenarios. Overall, MoCos advances skeleton-based re-ID by jointly modeling structure-aware and gait-aware relations and by exploiting rich combinatorial patterns for robust, discriminative representations.

Abstract

Person re-identification (re-ID) via 3D skeleton data is a challenging task with significant value in many scenarios. Existing skeleton-based methods typically assume virtual motion relations between all joints, and adopt average joint or sequence representations for learning. However, they rarely explore key body structure and motion such as gait to focus on more important body joints or limbs, while lacking the ability to fully mine valuable spatial-temporal sub-patterns of skeletons to enhance model learning. This paper presents a generic Motif guided graph transformer with Combinatorial skeleton prototype learning (MoCos) that exploits structure-specific and gait-related body relations as well as combinatorial features of skeleton graphs to learn effective skeleton representations for person re-ID. In particular, motivated by the locality within joints' structure and the body-component collaboration in gait, we first propose the motif guided graph transformer (MGT) that incorporates hierarchical structural motifs and gait collaborative motifs, which simultaneously focuses on multi-order local joint correlations and key cooperative body parts to enhance skeleton relation learning. Then, we devise the combinatorial skeleton prototype learning (CSP) that leverages random spatial-temporal combinations of joint nodes and skeleton graphs to generate diverse sub-skeleton and sub-tracklet representations, which are contrasted with the most representative features (prototypes) of each identity to learn class-related semantics and discriminative skeleton representations. Extensive experiments validate the superior performance of MoCos over existing state-of-the-art models. We further show its generality under RGB-estimated skeletons, different graph modeling, and unsupervised scenarios.

Paper Structure

This paper contains 15 sections, 13 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Our approach exploits various graph motifs to enhance skeletal relation learning, and utilizes diverse spatial (S.) and temporal (T.) combinatorial skeleton features to perform skeleton prototype learning for person re-ID.
  • Figure 2: Schematics of our approach: First, with position-encoded node representations for each skeleton graph $\mathcal{G}^{t}$, MGT incorporates hierarchical structural motifs (HSM) and gait collaborative motifs (GCM) to perform body relation learning, which concurrently focuses on multi-order structural correlations and gait-related collaborative body parts to enhance skeleton pattern learning. Then, CSP temporally and spatially masks joints and graphs to generate combinatorial sub-skeleton (SSk) and sub-tracklet (STr) representations, which are contrasted with skeleton prototypes generated from same-identity spatially-temporally averaged (S-Avg and T-Avg) skeleton graph representations. We enhance the similarity of both SSk and STr level features to their corresponding prototypes, while maximizing their dissimilarity to other prototypes by optimizing $\mathcal{L}_{\mathrm{CSP}}$.
  • Figure 3: (a) $t$-SNE visualization of features for the first ten classes in IAS and KS20. Different colors indicates different classes. (b) Visualization of mean relation values inferred by non-motif method rao2023transg (Left) and our MoCos (Right) on the same value scale and testing skeletons.