Identity Clue Refinement and Enhancement for Visible-Infrared Person Re-Identification
Guoqing Zhang, Zhun Wang, Hairui Wang, Zhonglin Ye, Yuhui Zheng
TL;DR
This work tackles cross-modal VI-ReID by shifting focus from solely learning modality-invariant representations to also preserving modality-specific identity cues embedded in shallow features. The proposed ICRe network combines a Multi-Perception Feature Refinement (MPFR) module that adaptively aggregates shallow, modality-specific attributes with a Semantic Distillation Cascade Enhancement (SDCE) that distills these cues into deeper, modality-invariant features via a two-stage transformer architecture. An Identity Clues Guided (ICG) loss then aligns cross-modal representations by leveraging modality-aware centers, improving intra-class compactness across modalities. Across SYSU-MM01, LLCM, and RegDB, ICRe delivers competitive to state-of-the-art results, with ablations confirming the effectiveness of MPFR and SDCE components and the stability benefits of the ICG loss.
Abstract
Visible-Infrared Person Re-Identification (VI-ReID) is a challenging cross-modal matching task due to significant modality discrepancies. While current methods mainly focus on learning modality-invariant features through unified embedding spaces, they often focus solely on the common discriminative semantics across modalities while disregarding the critical role of modality-specific identity-aware knowledge in discriminative feature learning. To bridge this gap, we propose a novel Identity Clue Refinement and Enhancement (ICRE) network to mine and utilize the implicit discriminative knowledge inherent in modality-specific attributes. Initially, we design a Multi-Perception Feature Refinement (MPFR) module that aggregates shallow features from shared branches, aiming to capture modality-specific attributes that are easily overlooked. Then, we propose a Semantic Distillation Cascade Enhancement (SDCE) module, which distills identity-aware knowledge from the aggregated shallow features and guide the learning of modality-invariant features. Finally, an Identity Clues Guided (ICG) Loss is proposed to alleviate the modality discrepancies within the enhanced features and promote the learning of a diverse representation space. Extensive experiments across multiple public datasets clearly show that our proposed ICRE outperforms existing SOTA methods.
