GaitSTR: Gait Recognition with Sequential Two-stream Refinement
Wanrong Zheng, Haidong Zhu, Zhaoheng Zheng, Ram Nevatia
TL;DR
Problem: robust gait recognition from walking sequences is challenged by clothing- and object-induced appearance variance and framewise skeleton jitters. Approach: GaitSTR fuses silhouettes with skeletons, refines joints and bones via a skeleton correction network, and uses cross-modal adapters to enable sequential, two-stream refinement guided by silhouette temporal cues. Contributions: joint+bone skeleton representation, internal skeleton self-correction, silhouette-guided cross-modal correction, and end-to-end training with triplet and classification losses; evaluated on CASIA-B, OU-MVLP, Gait3D, GREW, achieving state-of-the-art results without extra annotations. Significance: improved robustness to occlusions and appearance changes, enabling more reliable gait-based identification in real-world, long-distance scenarios.
Abstract
Gait recognition aims to identify a person based on their walking sequences, serving as a useful biometric modality as it can be observed from long distances without requiring cooperation from the subject. In representing a person's walking sequence, silhouettes and skeletons are the two primary modalities used. Silhouette sequences lack detailed part information when overlapping occurs between different body segments and are affected by carried objects and clothing. Skeletons, comprising joints and bones connecting the joints, provide more accurate part information for different segments; however, they are sensitive to occlusions and low-quality images, causing inconsistencies in frame-wise results within a sequence. In this paper, we explore the use of a two-stream representation of skeletons for gait recognition, alongside silhouettes. By fusing the combined data of silhouettes and skeletons, we refine the two-stream skeletons, joints, and bones through self-correction in graph convolution, along with cross-modal correction with temporal consistency from silhouettes. We demonstrate that with refined skeletons, the performance of the gait recognition model can achieve further improvement on public gait recognition datasets compared with state-of-the-art methods without extra annotations.
