Table of Contents
Fetching ...

Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning

Zhihao Qian, Yutian Lin, Bo Du

TL;DR

This work tackles the cross-modality challenge of VI-ReID by introducing Patch-Mixed Cross-Modality Learning (PMCM), which forms patch-level mixtures of RGB and IR images to reveal semantic correspondences without creating new data distributions. It couples a baseline two-stream backbone with center alignment, part-based learning, and a novel patch-mixed modality loss (PMML) to enforce cross-modality invariance and part-global consistency. The approach achieves state-of-the-art results on SYSU-MM01 and competitive performance on RegDB, with extensive ablations validating the effectiveness of each component and the balance controlled by the patch-mix ratio. Overall, PMCM provides a practical, data-efficient path to mitigating modality gaps and data imbalance in VI-ReID, with implications for robust cross-camera re-identification in real-world surveillance.

Abstract

Visible-infrared person re-identification (VI-ReID) aims to retrieve images of the same pedestrian from different modalities, where the challenges lie in the significant modality discrepancy. To alleviate the modality gap, recent methods generate intermediate images by GANs, grayscaling, or mixup strategies. However, these methods could introduce extra data distribution, and the semantic correspondence between the two modalities is not well learned. In this paper, we propose a Patch-Mixed Cross-Modality framework (PMCM), where two images of the same person from two modalities are split into patches and stitched into a new one for model learning. A part-alignment loss is introduced to regularize representation learning, and a patch-mixed modality learning loss is proposed to align between the modalities. In this way, the model learns to recognize a person through patches of different styles, thereby the modality semantic correspondence can be inferred. In addition, with the flexible image generation strategy, the patch-mixed images freely adjust the ratio of different modality patches, which could further alleviate the modality imbalance problem. On two VI-ReID datasets, we report new state-of-the-art performance with the proposed method.

Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning

TL;DR

This work tackles the cross-modality challenge of VI-ReID by introducing Patch-Mixed Cross-Modality Learning (PMCM), which forms patch-level mixtures of RGB and IR images to reveal semantic correspondences without creating new data distributions. It couples a baseline two-stream backbone with center alignment, part-based learning, and a novel patch-mixed modality loss (PMML) to enforce cross-modality invariance and part-global consistency. The approach achieves state-of-the-art results on SYSU-MM01 and competitive performance on RegDB, with extensive ablations validating the effectiveness of each component and the balance controlled by the patch-mix ratio. Overall, PMCM provides a practical, data-efficient path to mitigating modality gaps and data imbalance in VI-ReID, with implications for robust cross-camera re-identification in real-world surveillance.

Abstract

Visible-infrared person re-identification (VI-ReID) aims to retrieve images of the same pedestrian from different modalities, where the challenges lie in the significant modality discrepancy. To alleviate the modality gap, recent methods generate intermediate images by GANs, grayscaling, or mixup strategies. However, these methods could introduce extra data distribution, and the semantic correspondence between the two modalities is not well learned. In this paper, we propose a Patch-Mixed Cross-Modality framework (PMCM), where two images of the same person from two modalities are split into patches and stitched into a new one for model learning. A part-alignment loss is introduced to regularize representation learning, and a patch-mixed modality learning loss is proposed to align between the modalities. In this way, the model learns to recognize a person through patches of different styles, thereby the modality semantic correspondence can be inferred. In addition, with the flexible image generation strategy, the patch-mixed images freely adjust the ratio of different modality patches, which could further alleviate the modality imbalance problem. On two VI-ReID datasets, we report new state-of-the-art performance with the proposed method.
Paper Structure (32 sections, 10 equations, 8 figures, 6 tables)

This paper contains 32 sections, 10 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Different methods of generating the intermediate modality. (a) grayscale images are generated by visible images, (b) a mixed image is a global mixture of corresponding RGB and IR images, and (c) our proposed patch-mixed image, where each patch is either from RGB or IR, helps to infer the semantic corresponds between the two modalities and relieves the modality imbalance problem.
  • Figure 2: (a) Framework of the proposed PMCM. The patch-mixed image and the original images are together fed into the backbone network to extract features. After pooling, the obtained features are jointly optimized by the baseline loss, center-to-center loss, and patch-mixed modality learning loss. (b) Center-to-center (C2C) loss attempts to reduce the distance between the identity centers of any two modalities. (c) Patch-Mixed Modality Learning (PMML) loss aims to align the prediction distributions of the patch-mixed modality with that of the other two modalities, where global and local features are considered.
  • Figure 3: Patch-mixed images with different mix ratios $p$. When $p=0$, the image is composed of only an infrared image. As $p$ increases, more visible patches are adopted.
  • Figure 4: Influence of the different values of patch-mix ratio $p$, (a) experiments on SYSU-MM01 in all-search and single-shot mode, and (b) experiments on RegDB in Infrared2Visible mode.
  • Figure 5: Analysis of the Rank-1 accuracy with parameters $\lambda_1$, $\lambda_2$ and $\lambda_3$. We keep the other parameters constant while testing the target one, and the table shows the insensitivity of our PMCM to these parameters.
  • ...and 3 more figures