Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, Yanyun Qu
TL;DR
This work tackles unsupervised visible–infrared person re-identification by exposing reliability gaps in cross-modality pseudo-labels and correspondences. It proposes Multi-Memory Matching (MMM), comprising Cross-Modarity Clustering (CMC) to generate joint intra- and inter-modality pseudo-labels, Multi-Memory Learning and Matching (MMLM) to exploit multi-memory representations and a bipartite matching process, and Soft Cluster-level Alignment (SCA) to narrow modality gaps with noise-robust, soft alignments. The approach introduces ARI as a reliability metric and demonstrates state-of-the-art performance on SYSU-MM01 and RegDB, along with extensive ablations and hyper-parameter analyses. The work advances practical USL-VI-ReID by enabling more faithful cross-modality correspondences and providing code for reproducibility.
Abstract
Unsupervised visible-infrared person re-identification (USL-VI-ReID) is a promising yet challenging retrieval task. The key challenges in USL-VI-ReID are to effectively generate pseudo-labels and establish pseudo-label correspondences across modalities without relying on any prior annotations. Recently, clustered pseudo-label methods have gained more attention in USL-VI-ReID. However, previous methods fell short of fully exploiting the individual nuances, as they simply utilized a single memory that represented an identity to establish cross-modality correspondences, resulting in ambiguous cross-modality correspondences. To address the problem, we propose a Multi-Memory Matching (MMM) framework for USL-VI-ReID. We first design a Cross-Modality Clustering (CMC) module to generate the pseudo-labels through clustering together both two modality samples. To associate cross-modality clustered pseudo-labels, we design a Multi-Memory Learning and Matching (MMLM) module, ensuring that optimization explicitly focuses on the nuances of individual perspectives and establishes reliable cross-modality correspondences. Finally, we design a Soft Cluster-level Alignment (SCA) module to narrow the modality gap while mitigating the effect of noise pseudo-labels through a soft many-to-many alignment strategy. Extensive experiments on the public SYSU-MM01 and RegDB datasets demonstrate the reliability of the established cross-modality correspondences and the effectiveness of our MMM. The source codes will be released.
