Unsupervised Visible-Infrared ReID via Pseudo-label Correction and Modality-level Alignment
Yexin Liu, Weiming Zhang, Athanasios V. Vasilakos, Lin Wang
TL;DR
This work tackles unsupervised visible–infrared person re-identification by addressing two core issues: noisy pseudo labels from intra-modality clustering and cross-modality misalignment between visible and infrared features. It introduces PRAISE, a theory-informed framework that combines Pseudo-Label Correction (PLC) using a Beta Mixture Model to weigh pseudo-label noise in a perceptual-contrastive loss, with Modality-level Alignment (MLA) that employs bi-directional latent translation and centroid-based matching (SFM) plus CMA and LFC losses to align modalities and label functions. A generalization bound based on ${\cal H}$-divergence and empirical Rademacher complexity motivates the dual focus on reducing intra-modality errors and enforcing cross-modal alignment. Empirically, PRAISE achieves state-of-the-art performance among fully unsupervised VI-ReID methods on SYSU-MM01 and RegDB, approaching supervised VI-ReID at higher ranks and offering a practical avenue for cross-modal person re-identification without paired annotations.
Abstract
Unsupervised visible-infrared person re-identification (UVI-ReID) has recently gained great attention due to its potential for enhancing human detection in diverse environments without labeling. Previous methods utilize intra-modality clustering and cross-modality feature matching to achieve UVI-ReID. However, there exist two challenges: 1) noisy pseudo labels might be generated in the clustering process, and 2) the cross-modality feature alignment via matching the marginal distribution of visible and infrared modalities may misalign the different identities from two modalities. In this paper, we first conduct a theoretic analysis where an interpretable generalization upper bound is introduced. Based on the analysis, we then propose a novel unsupervised cross-modality person re-identification framework (PRAISE). Specifically, to address the first challenge, we propose a pseudo-label correction strategy that utilizes a Beta Mixture Model to predict the probability of mis-clustering based network's memory effect and rectifies the correspondence by adding a perceptual term to contrastive learning. Next, we introduce a modality-level alignment strategy that generates paired visible-infrared latent features and reduces the modality gap by aligning the labeling function of visible and infrared features to learn identity discriminative and modality-invariant features. Experimental results on two benchmark datasets demonstrate that our method achieves state-of-the-art performance than the unsupervised visible-ReID methods.
