CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition
Huanzhang Dou, Pengyi Zhang, Yuhan Zhao, Lu Jin, Xi Li
TL;DR
This work tackles gait recognition by addressing the limitations of sparse silhouette representations. It introduces DSTF, a dense spatial-temporal texture derived from a bidirectional distance transform, and combines it with silhouette features through NAS-driven complementary learning (NCL) via a multi-descriptor cell. The framework achieves state-of-the-art performance across in-the-lab and in-the-wild datasets, notably improving robustness and accuracy in challenging cross-view and unconstrained conditions. The approach demonstrates the effectiveness of dense texture representations and automated fusion architecture for gait analysis, with practical implications for surveillance and biometric systems.
Abstract
Gait recognition, which aims at identifying individuals by their walking patterns, has achieved great success based on silhouette. The binary silhouette sequence encodes the walking pattern within the sparse boundary representation. Therefore, most pixels in the silhouette are under-sensitive to the walking pattern since the sparse boundary lacks dense spatial-temporal information, which is suitable to be represented with dense texture. To enhance the sensitivity to the walking pattern while maintaining the robustness of recognition, we present a Complementary Learning with neural Architecture Search (CLASH) framework, consisting of walking pattern sensitive gait descriptor named dense spatial-temporal field (DSTF) and neural architecture search based complementary learning (NCL). Specifically, DSTF transforms the representation from the sparse binary boundary into the dense distance-based texture, which is sensitive to the walking pattern at the pixel level. Further, NCL presents a task-specific search space for complementary learning, which mutually complements the sensitivity of DSTF and the robustness of the silhouette to represent the walking pattern effectively. Extensive experiments demonstrate the effectiveness of the proposed methods under both in-the-lab and in-the-wild scenarios. On CASIA-B, we achieve rank-1 accuracy of 98.8%, 96.5%, and 89.3% under three conditions. On OU-MVLP, we achieve rank-1 accuracy of 91.9%. Under the latest in-the-wild datasets, we outperform the latest silhouette-based methods by 16.3% and 19.7% on Gait3D and GREW, respectively.
