Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID

De Cheng; Lingfeng He; Nannan Wang; Shizhou Zhang; Zhen Wang; Xinbo Gao

Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID

De Cheng, Lingfeng He, Nannan Wang, Shizhou Zhang, Zhen Wang, Xinbo Gao

TL;DR

This work proposes a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters and designs a Many-to-many Bilateral Cross-Modality Cluster Matching (MBCCM) algorithm through optimizing the maximum matching problem in a bipartite graph.

Abstract

Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to match pedestrian images of the same identity from different modalities without annotations. Existing works mainly focus on alleviating the modality gap by aligning instance-level features of the unlabeled samples. However, the relationships between cross-modality clusters are not well explored. To this end, we propose a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters. Specifically, we design a Many-to-many Bilateral Cross-Modality Cluster Matching (MBCCM) algorithm through optimizing the maximum matching problem in a bipartite graph. Then, the matched pairwise clusters utilize shared visible and infrared pseudo-labels during the model training. Under such a supervisory signal, a Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework is proposed to align features jointly at a cluster-level. Meanwhile, the cross-modality Consistency Constraint (CC) is proposed to explicitly reduce the large modality discrepancy. Extensive experiments on the public SYSU-MM01 and RegDB datasets demonstrate the effectiveness of the proposed method, surpassing state-of-the-art approaches by a large margin of 8.76% mAP on average.

Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID

TL;DR

Abstract

Paper Structure (18 sections, 10 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 10 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Supervised Cross-modality person ReID
Unsupervised single-modality person ReID
Unsupervised Cross-modality person ReID
Proposed method
Overview
Many-to-Many Bilateral Cross-modality Cluster Matching (MBCCM)
Modality-specific and Modality-agnostic (MSMA) Contrastive Learning
Cross-modality Consistency Constraint (CC)
Optimization
Experiment
Dataset and Evaluation Protocol
Implementation Details
Comparison with State-of-the-art Methods
...and 3 more sections

Figures (6)

Figure 1: The information exchange between modalities is a vital factor that affects the performance of VI-ReID. Existing methods always utilize the information between pairwise similar instances at an (a) instance-level, which cannot holistically build relationships between cross-modality classes. To address this issue, we propose a (b) cluster-level matching and generate shared-label cross-modality clusters that provide supervision for network training.
Figure 2: The overall framework of our method. Our method consists of a clustering stage, a matching stage, and a training stage. In the clustering stage (a), we assign pseudo labels to samples from each modality. In the matching stage (b), we utilize Many-to-many Bilateral Cross-Modality Centroid Matching (MBCCM) to perform a cluster-level match between cross-modality clusters. In the training stage (c)/(d), modality-specific and modality-agnostic (MSMA) memory banks jointly construct a contrastive learning framework, and the Consistency Constraint (CC) module further reduces the modality gap.
Figure 3: (a) A sketch map of the maximum matching problem in a weighted bipartite graph. (b) An example of a wrong match is shown, where two visible clusters with the same ID are matched to different infrared clusters under the one-to-one matching paradigm.
Figure 4: Performance of our framework with different values of $\alpha$ and $\beta$ on SYSU-MM01 and RegDB datasets.
Figure 5: The t-SNE2008tSNE visualization of 10 randomly selected identities. Different colors represent different ground-truth identities. "$\mathbf{\square}$" denotes the samples from infrared modality while "$\mathbf{\triangle}$" denotes the samples from visible modality.
...and 1 more figures

Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID

TL;DR

Abstract

Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID

Authors

TL;DR

Abstract

Table of Contents

Figures (6)