Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification
Zhizhong Zhang, Jiangming Wang, Xin Tan, Yanyun Qu, Junping Wang, Yong Xie, Yuan Xie
TL;DR
This work tackles unsupervised visible-infrared person re-identification by grounding cross-modal learning in mutual information. It derives three guiding principles—Sharpness, Fairness, and Fitness—and implements a looped training regime that alternates model updates with cross-modality prototype matching via a uniform-prior OT assignment (OTPA). Prototype-based contrastive learning (PBCL) and cross prediction alignment (CPAL) exploit the cross-modality correspondence to minimize intra- and cross-modality entropy, achieving strong results on SYSU-MM01 and RegDB without labels. The approach demonstrates notable improvements over prior USVI-ReID methods and competitive performance against supervised VI-ReID, with efficient computation and robustness to incomplete cross-modality overlap. These contributions advance unsupervised cross-modal learning by integrating MI theory, OT optimization, and prototype-based representation learning into a cohesive framework.
Abstract
Unsupervised visible infrared person re-identification (USVI-ReID) is a challenging retrieval task that aims to retrieve cross-modality pedestrian images without using any label information. In this task, the large cross-modality variance makes it difficult to generate reliable cross-modality labels, and the lack of annotations also provides additional difficulties for learning modality-invariant features. In this paper, we first deduce an optimization objective for unsupervised VI-ReID based on the mutual information between the model's cross-modality input and output. With equivalent derivation, three learning principles, i.e., "Sharpness" (entropy minimization), "Fairness" (uniform label distribution), and "Fitness" (reliable cross-modality matching) are obtained. Under their guidance, we design a loop iterative training strategy alternating between model training and cross-modality matching. In the matching stage, a uniform prior guided optimal transport assignment ("Fitness", "Fairness") is proposed to select matched visible and infrared prototypes. In the training stage, we utilize this matching information to introduce prototype-based contrastive learning for minimizing the intra- and cross-modality entropy ("Sharpness"). Extensive experimental results on benchmarks demonstrate the effectiveness of our method, e.g., 60.6% and 90.3% of Rank-1 accuracy on SYSU-MM01 and RegDB without any annotations.
