GaitPT: Skeletons Are All You Need For Gait Recognition

Andy Catruna; Adrian Cosma; Emilian Radoi

GaitPT: Skeletons Are All You Need For Gait Recognition

Andy Catruna, Adrian Cosma, Emilian Radoi

TL;DR

GaitPT addresses gait-based person identification using skeleton sequences without appearance cues. It introduces a four-stage hierarchical transformer that leverages anatomical priors to model joint, limb, and body-group movements, producing a discriminative 256-d gait embedding. The method achieves state-of-the-art results on CASIA-B, GREW, and Gait3D, with ablations confirming the benefit of each stage and a strong dependency on pose-estimation quality. This approach offers a privacy-friendly and scalable alternative for gait recognition with meaningful real-world impact in surveillance, healthcare, and Human-Computer Interaction contexts.

Abstract

The analysis of patterns of walking is an important area of research that has numerous applications in security, healthcare, sports and human-computer interaction. Lately, walking patterns have been regarded as a unique fingerprinting method for automatic person identification at a distance. In this work, we propose a novel gait recognition architecture called Gait Pyramid Transformer (GaitPT) that leverages pose estimation skeletons to capture unique walking patterns, without relying on appearance information. GaitPT adopts a hierarchical transformer architecture that effectively extracts both spatial and temporal features of movement in an anatomically consistent manner, guided by the structure of the human skeleton. Our results show that GaitPT achieves state-of-the-art performance compared to other skeleton-based gait recognition works, in both controlled and in-the-wild scenarios. GaitPT obtains 82.6% average accuracy on CASIA-B, surpassing other works by a margin of 6%. Moreover, it obtains 52.16% Rank-1 accuracy on GREW, outperforming both skeleton-based and appearance-based approaches.

GaitPT: Skeletons Are All You Need For Gait Recognition

TL;DR

Abstract

Paper Structure (15 sections, 6 equations, 3 figures, 6 tables)

This paper contains 15 sections, 6 equations, 3 figures, 6 tables.

Introduction
Related Work
Appearance-based Approaches
Model-based Approaches
Method
Preliminaries
Gait Pyramid Transformer
Implementation Details
Experiments and Results
Evaluation in Controlled Scenarios
Evaluation In the Wild
Ablation on GaitPT Stages
Pose Estimator Effect on Gait Performance
Limitations and Societal Impact
Conclusions

Figures (3)

Figure 1: Overview of the GaitPT architecture. The model uses spatial and temporal attention to incrementally learn the natural motion of the human body. In Stage 1 it computes the spatio-temporal interactions at the joint level, in Stage 2 at the limb level, in Stage 3 at the group of limbs level and in the final Stage computes the interaction between full skeletons. The representations obtained from every stage are combined into a gait embedding that captures discriminative features at all levels of movement.
Figure 2: Visualization of the Spatial Attention across all stages in the GaitPT architecture. Spatial Attention is applied across multiple joints / limbs of the body in the same time step. Spatial Attention is performed at the joint level at Stage 1, at the limb level at Stage 2, and at the level of groups of limbs at Stage 3.
Figure 3: Visualization of the Temporal Attention across all stages in the GaitPT architecture. Temporal Attention is applied to the same joints / limbs across different time-steps. Initially, Temporal Attention is performed at the joint level at Stage 1, at the limb level at Stage 2, at the level of groups of limbs at Stage 3 and at the whole body level in Stage 4.

GaitPT: Skeletons Are All You Need For Gait Recognition

TL;DR

Abstract

GaitPT: Skeletons Are All You Need For Gait Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (3)