Table of Contents
Fetching ...

Modality-Aware Bias Mitigation and Invariance Learning for Unsupervised Visible-Infrared Person Re-Identification

Menglin Wang, Xiaojin Gong, Jiachen Li, Genlin Ji

TL;DR

The paper tackles unsupervised cross-modality person re-identification by addressing modality-induced bias and variance within clusters. It introduces modality-aware Jaccard distance for global cross-modality association and a split-and-contrast strategy with modality-specific global prototypes to achieve modality-invariant, ID-discriminative representations. The method features a two-stage training (intra-modality clustering followed by bias-mitigated global clustering) and leverages a multi-positive contrastive loss to align modalities within global clusters. Empirical results on SYSU-MM01 and RegDB demonstrate state-of-the-art performance among unsupervised VI-ReID approaches, validating the effectiveness of bias mitigation and invariance learning for cross-modality matching.

Abstract

Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match individuals across visible and infrared cameras without relying on any annotation. Given the significant gap across visible and infrared modality, estimating reliable cross-modality association becomes a major challenge in USVI-ReID. Existing methods usually adopt optimal transport to associate the intra-modality clusters, which is prone to propagating the local cluster errors, and also overlooks global instance-level relations. By mining and attending to the visible-infrared modality bias, this paper focuses on addressing cross-modality learning from two aspects: bias-mitigated global association and modality-invariant representation learning. Motivated by the camera-aware distance rectification in single-modality re-ID, we propose modality-aware Jaccard distance to mitigate the distance bias caused by modality discrepancy, so that more reliable cross-modality associations can be estimated through global clustering. To further improve cross-modality representation learning, a `split-and-contrast' strategy is designed to obtain modality-specific global prototypes. By explicitly aligning these prototypes under global association guidance, modality-invariant yet ID-discriminative representation learning can be achieved. While conceptually simple, our method obtains state-of-the-art performance on benchmark VI-ReID datasets and outperforms existing methods by a significant margin, validating its effectiveness.

Modality-Aware Bias Mitigation and Invariance Learning for Unsupervised Visible-Infrared Person Re-Identification

TL;DR

The paper tackles unsupervised cross-modality person re-identification by addressing modality-induced bias and variance within clusters. It introduces modality-aware Jaccard distance for global cross-modality association and a split-and-contrast strategy with modality-specific global prototypes to achieve modality-invariant, ID-discriminative representations. The method features a two-stage training (intra-modality clustering followed by bias-mitigated global clustering) and leverages a multi-positive contrastive loss to align modalities within global clusters. Empirical results on SYSU-MM01 and RegDB demonstrate state-of-the-art performance among unsupervised VI-ReID approaches, validating the effectiveness of bias mitigation and invariance learning for cross-modality matching.

Abstract

Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match individuals across visible and infrared cameras without relying on any annotation. Given the significant gap across visible and infrared modality, estimating reliable cross-modality association becomes a major challenge in USVI-ReID. Existing methods usually adopt optimal transport to associate the intra-modality clusters, which is prone to propagating the local cluster errors, and also overlooks global instance-level relations. By mining and attending to the visible-infrared modality bias, this paper focuses on addressing cross-modality learning from two aspects: bias-mitigated global association and modality-invariant representation learning. Motivated by the camera-aware distance rectification in single-modality re-ID, we propose modality-aware Jaccard distance to mitigate the distance bias caused by modality discrepancy, so that more reliable cross-modality associations can be estimated through global clustering. To further improve cross-modality representation learning, a `split-and-contrast' strategy is designed to obtain modality-specific global prototypes. By explicitly aligning these prototypes under global association guidance, modality-invariant yet ID-discriminative representation learning can be achieved. While conceptually simple, our method obtains state-of-the-art performance on benchmark VI-ReID datasets and outperforms existing methods by a significant margin, validating its effectiveness.

Paper Structure

This paper contains 34 sections, 8 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of KNN distribution and feature visualization, using intra-modality (stage-1) trained model.
  • Figure 2: An overview of the proposed modality-aware learning framework. For cross-modality learning, a bias-mitigating global association is proposed, featuring the modality-aware neighbor rebalancing strategy to rectify the pairwise distance and overcome the modality-induced bias. The rectified distance is utilized by global clustering to obtain modality-mixed clusters. A split-and-contrast design captures the variance of global clusters, then a multi-positive contrastive loss is designed to facilitate the learning of modality-invariant yet ID-discriminative representation.
  • Figure 3: Intra-cluster feature distance distribution on SYSU-MM01. Features are extracted with intra-modality trained model.
  • Figure 4: Comparison of clustering accuracy ARI (Adjusted Random Index) on different methods.
  • Figure 5: T-SNE visualization of features from 10 randomly selected identities in SYSU-MM01 training set. Different color represents different identity. Circle and triangle denote visible and infrared image, respectively.
  • ...and 5 more figures