Table of Contents
Fetching ...

Exploring Deep Models for Practical Gait Recognition

Chao Fan, Saihui Hou, Yongzhen Huang, Shiqi Yu

TL;DR

The paper tackles practical gait recognition, arguing that existing shallow models fail to generalize to in-the-wild data. It advocates a depth-oriented approach emphasizing explicit temporal modeling and transformer-based architectures, introducing the CNN-based DeepGaitV2 series and the Transformer-based SwinGait series. On large real-world datasets GREW and Gait3D, SwinGait and DeepGaitV2 achieve substantial performance gains, with SwinGait often leading the outdoor results. The authors provide a practical baseline framework, ablation studies, and an open-source OpenGait codebase, highlighting both the progress and remaining challenges in bridging the gap to real-world deployment.

Abstract

Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively shallow networks to extract subtle gait features, achieving impressive successes in constrained settings. Nevertheless, experiments revealed that existing methods mostly produce unsatisfactory results when applied to newly released real-world gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Specifically, we challenge the stereotype of shallow gait models and demonstrate the superiority of explicit temporal modeling and deep transformer structure for discriminative gait representation learning. Consequently, the proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance improvements on Gait3D and GREW. As for the constrained gait datasets, the DeepGaitV2 series also reaches a new state-of-the-art in most cases, convincingly showing its practicality and generality. The source code is available at https://github.com/ShiqiYu/OpenGait.

Exploring Deep Models for Practical Gait Recognition

TL;DR

The paper tackles practical gait recognition, arguing that existing shallow models fail to generalize to in-the-wild data. It advocates a depth-oriented approach emphasizing explicit temporal modeling and transformer-based architectures, introducing the CNN-based DeepGaitV2 series and the Transformer-based SwinGait series. On large real-world datasets GREW and Gait3D, SwinGait and DeepGaitV2 achieve substantial performance gains, with SwinGait often leading the outdoor results. The authors provide a practical baseline framework, ablation studies, and an open-source OpenGait codebase, highlighting both the progress and remaining challenges in bridging the gap to real-world deployment.

Abstract

Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively shallow networks to extract subtle gait features, achieving impressive successes in constrained settings. Nevertheless, experiments revealed that existing methods mostly produce unsatisfactory results when applied to newly released real-world gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Specifically, we challenge the stereotype of shallow gait models and demonstrate the superiority of explicit temporal modeling and deep transformer structure for discriminative gait representation learning. Consequently, the proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance improvements on Gait3D and GREW. As for the constrained gait datasets, the DeepGaitV2 series also reaches a new state-of-the-art in most cases, convincingly showing its practicality and generality. The source code is available at https://github.com/ShiqiYu/OpenGait.
Paper Structure (8 sections, 8 figures, 7 tables)

This paper contains 8 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Constrained v.s. real-world gait recognition.
  • Figure 2: Dumb patch issue on the gait silhouette.
  • Figure 3: The gait framework and basic blocks employed for building DeepGaitV2 and SwinGait series.
  • Figure 4: Rank-1 accuracy of DeepGaitV2-2D and 3D with their backbone network going deeper. The performance of several state-of-the-art methods is introduced for reference.
  • Figure 5: The DeepGaitV2-3D series meets the over-fitting cases on (a) CASIA-B and (b) OU-MVLP, with the network depth increasing. The loss number presents the count of triplets that cause non-zero loss in the training batch, directly reflecting the network's convergence state.
  • ...and 3 more figures