Exploring Deep Models for Practical Gait Recognition
Chao Fan, Saihui Hou, Yongzhen Huang, Shiqi Yu
TL;DR
The paper tackles practical gait recognition, arguing that existing shallow models fail to generalize to in-the-wild data. It advocates a depth-oriented approach emphasizing explicit temporal modeling and transformer-based architectures, introducing the CNN-based DeepGaitV2 series and the Transformer-based SwinGait series. On large real-world datasets GREW and Gait3D, SwinGait and DeepGaitV2 achieve substantial performance gains, with SwinGait often leading the outdoor results. The authors provide a practical baseline framework, ablation studies, and an open-source OpenGait codebase, highlighting both the progress and remaining challenges in bridging the gap to real-world deployment.
Abstract
Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively shallow networks to extract subtle gait features, achieving impressive successes in constrained settings. Nevertheless, experiments revealed that existing methods mostly produce unsatisfactory results when applied to newly released real-world gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Specifically, we challenge the stereotype of shallow gait models and demonstrate the superiority of explicit temporal modeling and deep transformer structure for discriminative gait representation learning. Consequently, the proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance improvements on Gait3D and GREW. As for the constrained gait datasets, the DeepGaitV2 series also reaches a new state-of-the-art in most cases, convincingly showing its practicality and generality. The source code is available at https://github.com/ShiqiYu/OpenGait.
