Table of Contents
Fetching ...

MSCMNet: Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification

Xuecheng Hua, Ke Cheng, Hu Lu, Juanjuan Tu, Yuanquan Wang, Shitong Wang

TL;DR

MSCMNet tackles cross-modality gaps in VI-ReID by simultaneously mining semantic correlations across multiple scales and preserving modality-specific information. It introduces a quadruple-stream feature extractor (QFE), a multi-scale information correlation mining block (MIMB) with an ALB-based attention mechanism, and a Quadruple Center Triplet Loss (QCT) that combines cross-modal and intra-modality constraints with a negative-margin term. The approach yields state-of-the-art results on SYSU-MM01, RegDB, and LLCM datasets, demonstrating the effectiveness of multi-scale semantic information and cross-modal center constraints for robust VI-ReID. The method offers practical impact by enabling more reliable person re-identification across surveillance-visible and infrared cameras, especially under challenging lighting and viewpoint variations, and the MIMB framework could be adapted to other cross-modal recognition tasks.

Abstract

The main challenge in the Visible-Infrared Person Re-Identification (VI-ReID) task lies in how to extract discriminative features from different modalities for matching purposes. While the existing well works primarily focus on minimizing the modal discrepancies, the modality information can not thoroughly be leveraged. To solve this problem, a Multi-scale Semantic Correlation Mining network (MSCMNet) is proposed to comprehensively exploit semantic features at multiple scales and simultaneously reduce modality information loss as small as possible in feature extraction. The proposed network contains three novel components. Firstly, after taking into account the effective utilization of modality information, the Multi-scale Information Correlation Mining Block (MIMB) is designed to explore semantic correlations across multiple scales. Secondly, in order to enrich the semantic information that MIMB can utilize, a quadruple-stream feature extractor (QFE) with non-shared parameters is specifically designed to extract information from different dimensions of the dataset. Finally, the Quadruple Center Triplet Loss (QCT) is further proposed to address the information discrepancy in the comprehensive features. Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that the proposed MSCMNet achieves the greatest accuracy.

MSCMNet: Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification

TL;DR

MSCMNet tackles cross-modality gaps in VI-ReID by simultaneously mining semantic correlations across multiple scales and preserving modality-specific information. It introduces a quadruple-stream feature extractor (QFE), a multi-scale information correlation mining block (MIMB) with an ALB-based attention mechanism, and a Quadruple Center Triplet Loss (QCT) that combines cross-modal and intra-modality constraints with a negative-margin term. The approach yields state-of-the-art results on SYSU-MM01, RegDB, and LLCM datasets, demonstrating the effectiveness of multi-scale semantic information and cross-modal center constraints for robust VI-ReID. The method offers practical impact by enabling more reliable person re-identification across surveillance-visible and infrared cameras, especially under challenging lighting and viewpoint variations, and the MIMB framework could be adapted to other cross-modal recognition tasks.

Abstract

The main challenge in the Visible-Infrared Person Re-Identification (VI-ReID) task lies in how to extract discriminative features from different modalities for matching purposes. While the existing well works primarily focus on minimizing the modal discrepancies, the modality information can not thoroughly be leveraged. To solve this problem, a Multi-scale Semantic Correlation Mining network (MSCMNet) is proposed to comprehensively exploit semantic features at multiple scales and simultaneously reduce modality information loss as small as possible in feature extraction. The proposed network contains three novel components. Firstly, after taking into account the effective utilization of modality information, the Multi-scale Information Correlation Mining Block (MIMB) is designed to explore semantic correlations across multiple scales. Secondly, in order to enrich the semantic information that MIMB can utilize, a quadruple-stream feature extractor (QFE) with non-shared parameters is specifically designed to extract information from different dimensions of the dataset. Finally, the Quadruple Center Triplet Loss (QCT) is further proposed to address the information discrepancy in the comprehensive features. Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that the proposed MSCMNet achieves the greatest accuracy.
Paper Structure (26 sections, 17 equations, 11 figures, 4 tables, 1 algorithm)

This paper contains 26 sections, 17 equations, 11 figures, 4 tables, 1 algorithm.

Figures (11)

  • Figure 1: Motivation: Due to the semantic misalignment between features within the same layer of the network, some valuable modality information cannot be utilized. However, the correlation of this semantic information can be explored at multiple scales, which enables the extraction of more comprehensive personal features. Therefore, conducting a multi-scale exploration of valuable semantic information is crucial. Our MSCMNet effectively achieves this objective.
  • Figure 2: The framework of our proposed method: (a) The overall structure of visible-infrared Multi-scale Semantic Correlation Mining network (MSCMNet), which is based on the Residual Neural Network. MSCMNet contains three components: Quadruple-Stream Feature Extractors (QFE), Multi-scale Information Correlation Mining Block (MIMB), and the total loss. (b) The conceptual illustration of the designed Information Adoptive Lossless Block (ALB), which is the component of MIMB. (c) The diagram of the loss function contains quadruple center loss (QC), dual center loss (DC), and negative margin loss (NM).
  • Figure 3: (a) illustrates the extended dataset through data augmentation. (b) illustrates our quadruple-stream feature extractor, consisting of four convolutional layers with non-shared parameters.
  • Figure 4: The QC Loss consists of the dual center loss (b) and quar center loss (c). The dual center is employed to alleviate the significant differences between modalities, while the quar center enables the comprehensive utilization of information contained within the four-stream network.
  • Figure 5: Ablation study for MIMB with different numbers of the ALB and the effectiveness of multi-scale structures.
  • ...and 6 more figures