Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning
Zhihao Qian, Yutian Lin, Bo Du
TL;DR
This work tackles the cross-modality challenge of VI-ReID by introducing Patch-Mixed Cross-Modality Learning (PMCM), which forms patch-level mixtures of RGB and IR images to reveal semantic correspondences without creating new data distributions. It couples a baseline two-stream backbone with center alignment, part-based learning, and a novel patch-mixed modality loss (PMML) to enforce cross-modality invariance and part-global consistency. The approach achieves state-of-the-art results on SYSU-MM01 and competitive performance on RegDB, with extensive ablations validating the effectiveness of each component and the balance controlled by the patch-mix ratio. Overall, PMCM provides a practical, data-efficient path to mitigating modality gaps and data imbalance in VI-ReID, with implications for robust cross-camera re-identification in real-world surveillance.
Abstract
Visible-infrared person re-identification (VI-ReID) aims to retrieve images of the same pedestrian from different modalities, where the challenges lie in the significant modality discrepancy. To alleviate the modality gap, recent methods generate intermediate images by GANs, grayscaling, or mixup strategies. However, these methods could introduce extra data distribution, and the semantic correspondence between the two modalities is not well learned. In this paper, we propose a Patch-Mixed Cross-Modality framework (PMCM), where two images of the same person from two modalities are split into patches and stitched into a new one for model learning. A part-alignment loss is introduced to regularize representation learning, and a patch-mixed modality learning loss is proposed to align between the modalities. In this way, the model learns to recognize a person through patches of different styles, thereby the modality semantic correspondence can be inferred. In addition, with the flexible image generation strategy, the patch-mixed images freely adjust the ratio of different modality patches, which could further alleviate the modality imbalance problem. On two VI-ReID datasets, we report new state-of-the-art performance with the proposed method.
