Unsupervised Gait Recognition with Selective Fusion
Xuqian Ren, Shaopeng Yang, Saihui Hou, Chunshui Cao, Xu Liu, Yongzhen Huang
TL;DR
The paper tackles Unsupervised Gait Recognition (UGR), aiming to learn gait representations from unlabeled data. It introduces a cluster-based baseline built on cluster-contrastive learning and memory banks, then addresses practical challenges—cross-cloth variation and front/back view ambiguity—via Selective Fusion (SCF and SSF). SCF uses a cloth-aware, multi-cluster update mechanism to fuse same-identity sequences across clothes, while SSF employs a view classifier and curriculum learning to gradually align front/back view data with others. Across CASIA-BN, Outdoor-Gait, and GREW, the proposed SF framework consistently improves rank-1 accuracy, demonstrating robustness to clothing changes and view variations and reducing dependence on labeled data. The work advances scalable, unsupervised gait recognition suitable for real-world deployment and fine-tuning on unlabeled datasets.
Abstract
Previous gait recognition methods primarily trained on labeled datasets, which require painful labeling effort. However, using a pre-trained model on a new dataset without fine-tuning can lead to significant performance degradation. So to make the pre-trained gait recognition model able to be fine-tuned on unlabeled datasets, we propose a new task: Unsupervised Gait Recognition (UGR). We introduce a new cluster-based baseline to solve UGR with cluster-level contrastive learning. But we further find more challenges this task meets. First, sequences of the same person in different clothes tend to cluster separately due to the significant appearance changes. Second, sequences taken from 0° and 180° views lack walking postures and do not cluster with sequences taken from other views. To address these challenges, we propose a Selective Fusion method, which includes Selective Cluster Fusion (SCF) and Selective Sample Fusion (SSF). With SCF, we merge matched clusters of the same person wearing different clothes by updating the cluster-level memory bank with a multi-cluster update strategy. And in SSF, we merge sequences taken from front/back views gradually with curriculum learning. Extensive experiments show the effectiveness of our method in improving the rank-1 accuracy in walking with different coats condition and front/back views conditions.
