Table of Contents
Fetching ...

Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification

Guoqing Zhang, Zhun Wang, Hairui Wang, Zhonglin Ye, Yuhui Zheng

TL;DR

This work tackles cross-modal VI-ReID by shifting focus from solely learning modality-invariant representations to also preserving modality-specific identity cues embedded in shallow features. The proposed ICRe network combines a Multi-Perception Feature Refinement (MPFR) module that adaptively aggregates shallow, modality-specific attributes with a Semantic Distillation Cascade Enhancement (SDCE) that distills these cues into deeper, modality-invariant features via a two-stage transformer architecture. An Identity Clues Guided (ICG) loss then aligns cross-modal representations by leveraging modality-aware centers, improving intra-class compactness across modalities. Across SYSU-MM01, LLCM, and RegDB, ICRe delivers competitive to state-of-the-art results, with ablations confirming the effectiveness of MPFR and SDCE components and the stability benefits of the ICG loss.

Abstract

Visible-Infrared Person Re-Identification (VI-ReID) is a challenging cross-modal matching task due to significant modality discrepancies. While current methods mainly focus on learning modality-invariant features through unified embedding spaces, they often focus solely on the common discriminative semantics across modalities while disregarding the critical role of modality-specific identity-aware knowledge in discriminative feature learning. To bridge this gap, we propose a novel Identity Clue Refinement and Enhancement (ICRE) network to mine and utilize the implicit discriminative knowledge inherent in modality-specific attributes. Initially, we design a Multi-Perception Feature Refinement (MPFR) module that aggregates shallow features from shared branches, aiming to capture modality-specific attributes that are easily overlooked. Then, we propose a Semantic Distillation Cascade Enhancement (SDCE) module, which distills identity-aware knowledge from the aggregated shallow features and guide the learning of modality-invariant features. Finally, an Identity Clues Guided (ICG) Loss is proposed to alleviate the modality discrepancies within the enhanced features and promote the learning of a diverse representation space. Extensive experiments across multiple public datasets clearly show that our proposed ICRE outperforms existing SOTA methods.

Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification

TL;DR

This work tackles cross-modal VI-ReID by shifting focus from solely learning modality-invariant representations to also preserving modality-specific identity cues embedded in shallow features. The proposed ICRe network combines a Multi-Perception Feature Refinement (MPFR) module that adaptively aggregates shallow, modality-specific attributes with a Semantic Distillation Cascade Enhancement (SDCE) that distills these cues into deeper, modality-invariant features via a two-stage transformer architecture. An Identity Clues Guided (ICG) loss then aligns cross-modal representations by leveraging modality-aware centers, improving intra-class compactness across modalities. Across SYSU-MM01, LLCM, and RegDB, ICRe delivers competitive to state-of-the-art results, with ablations confirming the effectiveness of MPFR and SDCE components and the stability benefits of the ICG loss.

Abstract

Visible-Infrared Person Re-Identification (VI-ReID) is a challenging cross-modal matching task due to significant modality discrepancies. While current methods mainly focus on learning modality-invariant features through unified embedding spaces, they often focus solely on the common discriminative semantics across modalities while disregarding the critical role of modality-specific identity-aware knowledge in discriminative feature learning. To bridge this gap, we propose a novel Identity Clue Refinement and Enhancement (ICRE) network to mine and utilize the implicit discriminative knowledge inherent in modality-specific attributes. Initially, we design a Multi-Perception Feature Refinement (MPFR) module that aggregates shallow features from shared branches, aiming to capture modality-specific attributes that are easily overlooked. Then, we propose a Semantic Distillation Cascade Enhancement (SDCE) module, which distills identity-aware knowledge from the aggregated shallow features and guide the learning of modality-invariant features. Finally, an Identity Clues Guided (ICG) Loss is proposed to alleviate the modality discrepancies within the enhanced features and promote the learning of a diverse representation space. Extensive experiments across multiple public datasets clearly show that our proposed ICRE outperforms existing SOTA methods.

Paper Structure

This paper contains 17 sections, 12 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Our ICRE is motivated by the idea that modality-specific attributes retained in shallow features can effectively enhance intra-class compactness after removing modality interference.
  • Figure 2: (a) Overview of the proposed ICRE network, including a dual-stream backbone network, a Multi-Perception Feature Refinement (MPFR) module, and a Semantic Distillation Cascade Enhancement (SDCE) module. (b) and (c) illustrate the detailed processing flows of MPFR and SDCE, respectively. The network is ultimately constrained by both the identity (ID) loss and the identity clues guided (ICG) loss.
  • Figure 3: Analyze the effect of varying $\lambda$ and $\rho_{1}$ values on the SYSU-MM01 dataset.
  • Figure 4: (a)-(c) show the distance distribution between cross-modal features. Blue and green represent the distance frequency distribution of intra-class and inter-class respectively, and the vertical line denotes the distance mean of the distribution. (d)-(f) show the feature space distribution. Distinct colors and shapes signify various identities and modalities, while dashed ellipses indicate areas of identity confusion.
  • Figure 5: The visualization outcomes of discriminative regions on sample images by Grad-CAM. Each row of visible and infrared images belongs to the same identity, and the baseline visualization results are added for comparison.
  • ...and 2 more figures