Modality-Aware Bias Mitigation and Invariance Learning for Unsupervised Visible-Infrared Person Re-Identification
Menglin Wang, Xiaojin Gong, Jiachen Li, Genlin Ji
TL;DR
The paper tackles unsupervised cross-modality person re-identification by addressing modality-induced bias and variance within clusters. It introduces modality-aware Jaccard distance for global cross-modality association and a split-and-contrast strategy with modality-specific global prototypes to achieve modality-invariant, ID-discriminative representations. The method features a two-stage training (intra-modality clustering followed by bias-mitigated global clustering) and leverages a multi-positive contrastive loss to align modalities within global clusters. Empirical results on SYSU-MM01 and RegDB demonstrate state-of-the-art performance among unsupervised VI-ReID approaches, validating the effectiveness of bias mitigation and invariance learning for cross-modality matching.
Abstract
Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match individuals across visible and infrared cameras without relying on any annotation. Given the significant gap across visible and infrared modality, estimating reliable cross-modality association becomes a major challenge in USVI-ReID. Existing methods usually adopt optimal transport to associate the intra-modality clusters, which is prone to propagating the local cluster errors, and also overlooks global instance-level relations. By mining and attending to the visible-infrared modality bias, this paper focuses on addressing cross-modality learning from two aspects: bias-mitigated global association and modality-invariant representation learning. Motivated by the camera-aware distance rectification in single-modality re-ID, we propose modality-aware Jaccard distance to mitigate the distance bias caused by modality discrepancy, so that more reliable cross-modality associations can be estimated through global clustering. To further improve cross-modality representation learning, a `split-and-contrast' strategy is designed to obtain modality-specific global prototypes. By explicitly aligning these prototypes under global association guidance, modality-invariant yet ID-discriminative representation learning can be achieved. While conceptually simple, our method obtains state-of-the-art performance on benchmark VI-ReID datasets and outperforms existing methods by a significant margin, validating its effectiveness.
