Table of Contents
Fetching ...

Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification

Xiangbo Yin, Jiangming Shi, Yachao Zhang, Yang Lu, Zhizhong Zhang, Yuan Xie, Yanyun Qu

TL;DR

This work tackles unsupervised visible-infrared person re-identification by targeting two core issues: noisy pseudo-labels and unreliable cross-modality correspondences. It introduces the Robust Pseudo-label Learning with Neighbor Relation (RPNR) framework, which combines a Noisy Pseudo-label Calibration step, Neighbor Relation Learning to reduce intra-class variance, Optimal Transport Prototype Matching for cross-modality alignment, and Memory Hybrid Learning to fuse modality-specific and modality-invariant information. Empirical results on SYSU-MM01 and RegDB show substantial improvements over prior USVI-ReID methods, including a notable Rank-1 gain on SYSU-MM01 and dramatic gains on RegDB, validating the proposed calibration, prototype-based alignment, and memory-based contrastive learning strategy. The approach advances practical unsupervised VI-ReID by producing higher-quality pseudo-labels and more dependable cross-modality correspondences, with potential impact on surveillance and multi-modal analytics.

Abstract

Unsupervised Visible-Infrared Person Re-identification (USVI-ReID) presents a formidable challenge, which aims to match pedestrian images across visible and infrared modalities without any annotations. Recently, clustered pseudo-label methods have become predominant in USVI-ReID, although the inherent noise in pseudo-labels presents a significant obstacle. Most existing works primarily focus on shielding the model from the harmful effects of noise, neglecting to calibrate noisy pseudo-labels usually associated with hard samples, which will compromise the robustness of the model. To address this issue, we design a Robust Pseudo-label Learning with Neighbor Relation (RPNR) framework for USVI-ReID. To be specific, we first introduce a straightforward yet potent Noisy Pseudo-label Calibration module to correct noisy pseudo-labels. Due to the high intra-class variations, noisy pseudo-labels are difficult to calibrate completely. Therefore, we introduce a Neighbor Relation Learning module to reduce high intra-class variations by modeling potential interactions between all samples. Subsequently, we devise an Optimal Transport Prototype Matching module to establish reliable cross-modality correspondences. On that basis, we design a Memory Hybrid Learning module to jointly learn modality-specific and modality-invariant information. Comprehensive experiments conducted on two widely recognized benchmarks, SYSU-MM01 and RegDB, demonstrate that RPNR outperforms the current state-of-the-art GUR with an average Rank-1 improvement of 10.3%. The source codes will be released soon.

Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification

TL;DR

This work tackles unsupervised visible-infrared person re-identification by targeting two core issues: noisy pseudo-labels and unreliable cross-modality correspondences. It introduces the Robust Pseudo-label Learning with Neighbor Relation (RPNR) framework, which combines a Noisy Pseudo-label Calibration step, Neighbor Relation Learning to reduce intra-class variance, Optimal Transport Prototype Matching for cross-modality alignment, and Memory Hybrid Learning to fuse modality-specific and modality-invariant information. Empirical results on SYSU-MM01 and RegDB show substantial improvements over prior USVI-ReID methods, including a notable Rank-1 gain on SYSU-MM01 and dramatic gains on RegDB, validating the proposed calibration, prototype-based alignment, and memory-based contrastive learning strategy. The approach advances practical unsupervised VI-ReID by producing higher-quality pseudo-labels and more dependable cross-modality correspondences, with potential impact on surveillance and multi-modal analytics.

Abstract

Unsupervised Visible-Infrared Person Re-identification (USVI-ReID) presents a formidable challenge, which aims to match pedestrian images across visible and infrared modalities without any annotations. Recently, clustered pseudo-label methods have become predominant in USVI-ReID, although the inherent noise in pseudo-labels presents a significant obstacle. Most existing works primarily focus on shielding the model from the harmful effects of noise, neglecting to calibrate noisy pseudo-labels usually associated with hard samples, which will compromise the robustness of the model. To address this issue, we design a Robust Pseudo-label Learning with Neighbor Relation (RPNR) framework for USVI-ReID. To be specific, we first introduce a straightforward yet potent Noisy Pseudo-label Calibration module to correct noisy pseudo-labels. Due to the high intra-class variations, noisy pseudo-labels are difficult to calibrate completely. Therefore, we introduce a Neighbor Relation Learning module to reduce high intra-class variations by modeling potential interactions between all samples. Subsequently, we devise an Optimal Transport Prototype Matching module to establish reliable cross-modality correspondences. On that basis, we design a Memory Hybrid Learning module to jointly learn modality-specific and modality-invariant information. Comprehensive experiments conducted on two widely recognized benchmarks, SYSU-MM01 and RegDB, demonstrate that RPNR outperforms the current state-of-the-art GUR with an average Rank-1 improvement of 10.3%. The source codes will be released soon.
Paper Structure (19 sections, 23 equations, 6 figures, 2 tables)

This paper contains 19 sections, 23 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overall of the proposed RPNR (best viewed in color). Given unlabeled visible-infrared data, RPNR first generates modality-specific pseudo-labels by DBSCAN at stage (a). After that, RPNR calibrates noisy pseudo-labels (grey dots) to obtain robust pseudo-labels (color dots) at stage (b), while modeling potential interactions between all samples (the strength is indicated by thickness) to reduce intra-class variation at stage (c). Additionally, based on the robust pseudo-labels, RPNR employs the optimal transport to establish cross-modality correspondences at stage (d). Finally, with the recast aligned pseudo-labels, RPNR mixes two modality-specific memories as a new hybrid memory to learn both modality-specific and modality-invariant information through contrastive loss at stage (e).
  • Figure 2: The ARI metric of visible and infrared pseudo-labels on SYSU-MM01 at each epoch.
  • Figure 3: The accuracy of cross-modality correspondences compared with PGM PGMAL on SYSU-MM01.
  • Figure 4: The influence of three import hyper-parameters with different values on SYSU-MM01.
  • Figure 5: Four clustering evaluation metrics compared with state-of-the-art methods on the SYSU-MM01 dataset. "RGB" and "IR" denote the accuracy of visible and infrared pseudo-labels, respectively.
  • ...and 1 more figures