Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition
Haijun Xiong, Bin Feng, Xinggang Wang, Wenyu Liu
TL;DR
This work tackles the persistent problem of ID feature entanglement with non-ID confounders in gait recognition by introducing CLTD, a causality-inspired discriminative learning module. CLTD integrates a Cross Pixel-wise Attention Generator and a Fourier Projection Head to eliminate confounders across spatial, temporal, and spectral domains, supervised by a Factual and Counterfactual Loss that leverages InfoNCE and Total Direct Effect concepts. By deploying CLTD at multiple stages of a gait backbone, the method achieves state-of-the-art performance across multiple datasets (OU-MVLP, CASIA-B, GREW, Gait3D) and demonstrates strong robustness on wild data, with notable improvements over baselines. The approach offers a versatile, plug-and-play training paradigm that can enhance diverse gait recognition models and potentially extend to other computer vision tasks requiring robust, confounder-free representations.
Abstract
Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence of confounders in triple domains, \ie, spatial, temporal, and spectral. Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains. Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs. Additionally, we employ an optimization method with contrastive learning to enforce semantic consistency constraints across sequences from the same subject. Our approach has demonstrated significant performance improvements on challenging datasets, proving its effectiveness. Moreover, it can be seamlessly integrated into existing gait recognition methods.
