Table of Contents
Fetching ...

GaitPT: Skeletons Are All You Need For Gait Recognition

Andy Catruna, Adrian Cosma, Emilian Radoi

TL;DR

GaitPT addresses gait-based person identification using skeleton sequences without appearance cues. It introduces a four-stage hierarchical transformer that leverages anatomical priors to model joint, limb, and body-group movements, producing a discriminative 256-d gait embedding. The method achieves state-of-the-art results on CASIA-B, GREW, and Gait3D, with ablations confirming the benefit of each stage and a strong dependency on pose-estimation quality. This approach offers a privacy-friendly and scalable alternative for gait recognition with meaningful real-world impact in surveillance, healthcare, and Human-Computer Interaction contexts.

Abstract

The analysis of patterns of walking is an important area of research that has numerous applications in security, healthcare, sports and human-computer interaction. Lately, walking patterns have been regarded as a unique fingerprinting method for automatic person identification at a distance. In this work, we propose a novel gait recognition architecture called Gait Pyramid Transformer (GaitPT) that leverages pose estimation skeletons to capture unique walking patterns, without relying on appearance information. GaitPT adopts a hierarchical transformer architecture that effectively extracts both spatial and temporal features of movement in an anatomically consistent manner, guided by the structure of the human skeleton. Our results show that GaitPT achieves state-of-the-art performance compared to other skeleton-based gait recognition works, in both controlled and in-the-wild scenarios. GaitPT obtains 82.6% average accuracy on CASIA-B, surpassing other works by a margin of 6%. Moreover, it obtains 52.16% Rank-1 accuracy on GREW, outperforming both skeleton-based and appearance-based approaches.

GaitPT: Skeletons Are All You Need For Gait Recognition

TL;DR

GaitPT addresses gait-based person identification using skeleton sequences without appearance cues. It introduces a four-stage hierarchical transformer that leverages anatomical priors to model joint, limb, and body-group movements, producing a discriminative 256-d gait embedding. The method achieves state-of-the-art results on CASIA-B, GREW, and Gait3D, with ablations confirming the benefit of each stage and a strong dependency on pose-estimation quality. This approach offers a privacy-friendly and scalable alternative for gait recognition with meaningful real-world impact in surveillance, healthcare, and Human-Computer Interaction contexts.

Abstract

The analysis of patterns of walking is an important area of research that has numerous applications in security, healthcare, sports and human-computer interaction. Lately, walking patterns have been regarded as a unique fingerprinting method for automatic person identification at a distance. In this work, we propose a novel gait recognition architecture called Gait Pyramid Transformer (GaitPT) that leverages pose estimation skeletons to capture unique walking patterns, without relying on appearance information. GaitPT adopts a hierarchical transformer architecture that effectively extracts both spatial and temporal features of movement in an anatomically consistent manner, guided by the structure of the human skeleton. Our results show that GaitPT achieves state-of-the-art performance compared to other skeleton-based gait recognition works, in both controlled and in-the-wild scenarios. GaitPT obtains 82.6% average accuracy on CASIA-B, surpassing other works by a margin of 6%. Moreover, it obtains 52.16% Rank-1 accuracy on GREW, outperforming both skeleton-based and appearance-based approaches.
Paper Structure (15 sections, 6 equations, 3 figures, 6 tables)

This paper contains 15 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of the GaitPT architecture. The model uses spatial and temporal attention to incrementally learn the natural motion of the human body. In Stage 1 it computes the spatio-temporal interactions at the joint level, in Stage 2 at the limb level, in Stage 3 at the group of limbs level and in the final Stage computes the interaction between full skeletons. The representations obtained from every stage are combined into a gait embedding that captures discriminative features at all levels of movement.
  • Figure 2: Visualization of the Spatial Attention across all stages in the GaitPT architecture. Spatial Attention is applied across multiple joints / limbs of the body in the same time step. Spatial Attention is performed at the joint level at Stage 1, at the limb level at Stage 2, and at the level of groups of limbs at Stage 3.
  • Figure 3: Visualization of the Temporal Attention across all stages in the GaitPT architecture. Temporal Attention is applied to the same joints / limbs across different time-steps. Initially, Temporal Attention is performed at the joint level at Stage 1, at the limb level at Stage 2, at the level of groups of limbs at Stage 3 and at the whole body level in Stage 4.