GaitASMS: Gait Recognition by Adaptive Structured Spatial Representation and Multi-Scale Temporal Aggregation
Yan Sun, Hu Long, Xueling Feng, Mark Nixon
TL;DR
GaitASMS addresses occlusion and view-variation in gait recognition by integrating an Adaptive Structured Representation Extraction (ASRE) module for edge-aware local features with a Global Feature Extractor, and a Multi-Scale Temporal Aggregation (MSTA) module for long-short-range temporal modeling using dilated 3D convolutions. A novel random-mask augmentation enlarges the occlusion-robustness of the model. Across CASIA-B and OU-MVLP, GaitASMS achieves state-of-the-art or competitive performance, with extensive ablations confirming the effectiveness of ASRE, MSTA, and the random-mask strategy, and demonstrating good transferability to related architectures like GaitGL.
Abstract
Gait recognition is one of the most promising video-based biometric technologies. The edge of silhouettes and motion are the most informative feature and previous studies have explored them separately and achieved notable results. However, due to occlusions and variations in viewing angles, their gait recognition performance is often affected by the predefined spatial segmentation strategy. Moreover, traditional temporal pooling usually neglects distinctive temporal information in gait. To address the aforementioned issues, we propose a novel gait recognition framework, denoted as GaitASMS, which can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information. The Adaptive Structured Representation Extraction Module (ASRE) separates the edge of silhouettes by using the adaptive edge mask and maximizes the representation in semantic latent space. Moreover, the Multi-Scale Temporal Aggregation Module (MSTA) achieves effective modeling of long-short-range temporal information by temporally aggregated structure. Furthermore, we propose a new data augmentation, denoted random mask, to enrich the sample space of long-term occlusion and enhance the generalization of the model. Extensive experiments conducted on two datasets demonstrate the competitive advantage of proposed method, especially in complex scenes, i.e. BG and CL. On the CASIA-B dataset, GaitASMS achieves the average accuracy of 93.5\% and outperforms the baseline on rank-1 accuracies by 3.4\% and 6.3\%, respectively, in BG and CL. The ablation experiments demonstrate the effectiveness of ASRE and MSTA. The source code is available at https://github.com/YanSungithub/GaitASMS.
