Unsupervised Gait Recognition with Selective Fusion

Xuqian Ren; Shaopeng Yang; Saihui Hou; Chunshui Cao; Xu Liu; Yongzhen Huang

Unsupervised Gait Recognition with Selective Fusion

Xuqian Ren, Shaopeng Yang, Saihui Hou, Chunshui Cao, Xu Liu, Yongzhen Huang

TL;DR

The paper tackles Unsupervised Gait Recognition (UGR), aiming to learn gait representations from unlabeled data. It introduces a cluster-based baseline built on cluster-contrastive learning and memory banks, then addresses practical challenges—cross-cloth variation and front/back view ambiguity—via Selective Fusion (SCF and SSF). SCF uses a cloth-aware, multi-cluster update mechanism to fuse same-identity sequences across clothes, while SSF employs a view classifier and curriculum learning to gradually align front/back view data with others. Across CASIA-BN, Outdoor-Gait, and GREW, the proposed SF framework consistently improves rank-1 accuracy, demonstrating robustness to clothing changes and view variations and reducing dependence on labeled data. The work advances scalable, unsupervised gait recognition suitable for real-world deployment and fine-tuning on unlabeled datasets.

Abstract

Previous gait recognition methods primarily trained on labeled datasets, which require painful labeling effort. However, using a pre-trained model on a new dataset without fine-tuning can lead to significant performance degradation. So to make the pre-trained gait recognition model able to be fine-tuned on unlabeled datasets, we propose a new task: Unsupervised Gait Recognition (UGR). We introduce a new cluster-based baseline to solve UGR with cluster-level contrastive learning. But we further find more challenges this task meets. First, sequences of the same person in different clothes tend to cluster separately due to the significant appearance changes. Second, sequences taken from 0° and 180° views lack walking postures and do not cluster with sequences taken from other views. To address these challenges, we propose a Selective Fusion method, which includes Selective Cluster Fusion (SCF) and Selective Sample Fusion (SSF). With SCF, we merge matched clusters of the same person wearing different clothes by updating the cluster-level memory bank with a multi-cluster update strategy. And in SSF, we merge sequences taken from front/back views gradually with curriculum learning. Extensive experiments show the effectiveness of our method in improving the rank-1 accuracy in walking with different coats condition and front/back views conditions.

Unsupervised Gait Recognition with Selective Fusion

TL;DR

Abstract

Paper Structure (68 sections, 12 equations, 12 figures, 22 tables, 2 algorithms)

This paper contains 68 sections, 12 equations, 12 figures, 22 tables, 2 algorithms.

Introduction
Related Work
Gait Recognition
Unsupervised Person Re-identification
Our Method
Problem Formulation
Proposed Baseline
Proposed Method
Cloth Augmentation
Selective Cluster Fusion
Selective Sample Fusion
Training Strategy
Experiments
Datasets
CASIA-BN
...and 53 more sections

Figures (12)

Figure 1: Two main challenges in UGR. A kind of style in different colors denotes a subject in different clothes, which are usually erroneously assigned with different pseudo labels (e.g., '001', '002'). Also, sequences taken from front/back views of different subjects tend to mix together (e.g., '003').
Figure 2: Overview of the framework with Selective Fusion. The upper two branches generate pseudo labels and initialize a memory bank at the start of each epoch. The lower branch accepts mini-batch extracted from pseudo clusters and calculates ClusterNCE Loss with Memory Bank to update it and the backbone in each iteration. CA is the Cloth Augmentation method. InfoMap is employed in the Cluster module. SCF means Selective Cluster Fusion. SSF represents Selective Sample Fusion. In the Support set, the darker the color, the higher the similarity with the target cluster (Best viewed in color).
Figure 3: The visualization of data augmentation on NM and CL conditions. Cloth Augmentation can simulate the potential appearance in different conditions of the same person.
Figure 4: The three stages in our training strategy. First, with narrowed features extracted by the pre-trained model, we first adopt our baseline to separate them further. Then Selective Fusion is used to fuse matched clusters and samples together. Finally, we gain clusters with different clothes and views. The base is the Baseline. Each type in a different color indicates each subject in a different cloth condition.
Figure 5: The effect of hyper-parameters $s_{up}/n/\tau/m$ on baselines. In there we choose a set of hyper-parameters that have the best result in our experiments. Other hyper-parameters do not change the result a lot, just lead to sub-optimal.
...and 7 more figures

Unsupervised Gait Recognition with Selective Fusion

TL;DR

Abstract

Unsupervised Gait Recognition with Selective Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (12)