GaitPT: Skeletons Are All You Need For Gait Recognition
Andy Catruna, Adrian Cosma, Emilian Radoi
TL;DR
GaitPT addresses gait-based person identification using skeleton sequences without appearance cues. It introduces a four-stage hierarchical transformer that leverages anatomical priors to model joint, limb, and body-group movements, producing a discriminative 256-d gait embedding. The method achieves state-of-the-art results on CASIA-B, GREW, and Gait3D, with ablations confirming the benefit of each stage and a strong dependency on pose-estimation quality. This approach offers a privacy-friendly and scalable alternative for gait recognition with meaningful real-world impact in surveillance, healthcare, and Human-Computer Interaction contexts.
Abstract
The analysis of patterns of walking is an important area of research that has numerous applications in security, healthcare, sports and human-computer interaction. Lately, walking patterns have been regarded as a unique fingerprinting method for automatic person identification at a distance. In this work, we propose a novel gait recognition architecture called Gait Pyramid Transformer (GaitPT) that leverages pose estimation skeletons to capture unique walking patterns, without relying on appearance information. GaitPT adopts a hierarchical transformer architecture that effectively extracts both spatial and temporal features of movement in an anatomically consistent manner, guided by the structure of the human skeleton. Our results show that GaitPT achieves state-of-the-art performance compared to other skeleton-based gait recognition works, in both controlled and in-the-wild scenarios. GaitPT obtains 82.6% average accuracy on CASIA-B, surpassing other works by a margin of 6%. Moreover, it obtains 52.16% Rank-1 accuracy on GREW, outperforming both skeleton-based and appearance-based approaches.
