WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification
Yonggan Wu, Ling-Chao Meng, Yuan Zichao, Sixian Chan, Hong-Qiang Wang
TL;DR
This paper tackles cross-modality gaps in visible–infrared person re-identification by introducing WRIM-Net, a framework that mines wide-ranging information through a Multi-dimension Interactive Information Mining (MIIM) module and an Auxiliary-Information-based Contrastive Learning (AICL) approach. MIIM enables non-local spatial and channel interactions, with separate modules in shallow layers for specific-modality information and a shared module in deeper layers for shared-modality information, boosted by Global Region Interaction. AICL leverages Cross-Modality Key-Instance Contrastive (CMKIC) loss to pull same-ID samples across modalities closer while challenging the model with top-K difficult positives, supplemented by auxiliary information from earlier blocks. The method achieves state-of-the-art results on SYSU-MM01, RegDB, and LLCM, demonstrating strong improvements in cross-modality invariant feature learning and practical VI-ReID performance.
Abstract
For the visible-infrared person re-identification (VI-ReID) task, one of the primary challenges lies in significant cross-modality discrepancy. Existing methods struggle to conduct modality-invariant information mining. They often focus solely on mining singular dimensions like spatial or channel, and overlook the extraction of specific-modality multi-dimension information. To fully mine modality-invariant information across a wide range, we introduce the Wide-Ranging Information Mining Network (WRIM-Net), which mainly comprises a Multi-dimension Interactive Information Mining (MIIM) module and an Auxiliary-Information-based Contrastive Learning (AICL) approach. Empowered by the proposed Global Region Interaction (GRI), MIIM comprehensively mines non-local spatial and channel information through intra-dimension interaction. Moreover, Thanks to the low computational complexity design, separate MIIM can be positioned in shallow layers, enabling the network to better mine specific-modality multi-dimension information. AICL, by introducing the novel Cross-Modality Key-Instance Contrastive (CMKIC) loss, effectively guides the network in extracting modality-invariant information. We conduct extensive experiments not only on the well-known SYSU-MM01 and RegDB datasets but also on the latest large-scale cross-modality LLCM dataset. The results demonstrate WRIM-Net's superiority over state-of-the-art methods.
