Table of Contents
Fetching ...

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

Kaijie Ren, Lei Zhang

TL;DR

This work tackles cross-modal VI-ReID by leveraging implicit discriminative information hidden in modality-specific features to boost modality-shared representations. It introduces a dual-stream network with an IN-guided Information Purifier to preserve identity cues while reducing style discrepancies, and distills implicit knowledge into the shared space through Triplet Graph Structure Alignment and Class Semantic Alignment, aided by a Modality Discrepancy Reduction loss. The approach yields state-of-the-art results on SYSU-MM01 and strong performance on RegDB and LLCM, demonstrating the effectiveness of utilizing modality-specific cues rather than discarding them. The combination of feature- and logit-level distillation, along with robust purification, offers a practical path to more accurate cross-modal person re-identification.

Abstract

Visible-Infrared Person Re-identification (VI-ReID) is a challenging cross-modal pedestrian retrieval task, due to significant intra-class variations and cross-modal discrepancies among different cameras. Existing works mainly focus on embedding images of different modalities into a unified space to mine modality-shared features. They only seek distinctive information within these shared features, while ignoring the identity-aware useful information that is implicit in the modality-specific features. To address this issue, we propose a novel Implicit Discriminative Knowledge Learning (IDKL) network to uncover and leverage the implicit discriminative information contained within the modality-specific. First, we extract modality-specific and modality-shared features using a novel dual-stream network. Then, the modality-specific features undergo purification to reduce their modality style discrepancies while preserving identity-aware discriminative knowledge. Subsequently, this kind of implicit knowledge is distilled into the modality-shared feature to enhance its distinctiveness. Finally, an alignment loss is proposed to minimize modality discrepancy on enhanced modality-shared features. Extensive experiments on multiple public datasets demonstrate the superiority of IDKL network over the state-of-the-art methods. Code is available at https://github.com/1KK077/IDKL.

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

TL;DR

This work tackles cross-modal VI-ReID by leveraging implicit discriminative information hidden in modality-specific features to boost modality-shared representations. It introduces a dual-stream network with an IN-guided Information Purifier to preserve identity cues while reducing style discrepancies, and distills implicit knowledge into the shared space through Triplet Graph Structure Alignment and Class Semantic Alignment, aided by a Modality Discrepancy Reduction loss. The approach yields state-of-the-art results on SYSU-MM01 and strong performance on RegDB and LLCM, demonstrating the effectiveness of utilizing modality-specific cues rather than discarding them. The combination of feature- and logit-level distillation, along with robust purification, offers a practical path to more accurate cross-modal person re-identification.

Abstract

Visible-Infrared Person Re-identification (VI-ReID) is a challenging cross-modal pedestrian retrieval task, due to significant intra-class variations and cross-modal discrepancies among different cameras. Existing works mainly focus on embedding images of different modalities into a unified space to mine modality-shared features. They only seek distinctive information within these shared features, while ignoring the identity-aware useful information that is implicit in the modality-specific features. To address this issue, we propose a novel Implicit Discriminative Knowledge Learning (IDKL) network to uncover and leverage the implicit discriminative information contained within the modality-specific. First, we extract modality-specific and modality-shared features using a novel dual-stream network. Then, the modality-specific features undergo purification to reduce their modality style discrepancies while preserving identity-aware discriminative knowledge. Subsequently, this kind of implicit knowledge is distilled into the modality-shared feature to enhance its distinctiveness. Finally, an alignment loss is proposed to minimize modality discrepancy on enhanced modality-shared features. Extensive experiments on multiple public datasets demonstrate the superiority of IDKL network over the state-of-the-art methods. Code is available at https://github.com/1KK077/IDKL.
Paper Structure (16 sections, 21 equations, 6 figures, 4 tables)

This paper contains 16 sections, 21 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Previous methods focused on seeking discriminative information within modality-shared features, overlooking the fact that there are discriminative clues implicit in modality-specific features. It is worth considering utilization of the implicit discriminative information to enhance shared invariant feature.
  • Figure 2: Framework of the proposed Implicit Discriminative Knowledge Learning (IDKL) model. The dual one-stream network built by resnet blocks first extracts the modality-specific $\boldsymbol{F}_{sp}$ and modality shared feature $\boldsymbol{F}_{sh}$ under the constraint of modality discriminator and modality confuser accordingly, while the common ReID loss is used to optimize network basely. Then, the modality-specific feature is fed into the information purifier to regulate the modality style discrepancy while preserving the implicit discriminative information and obtain the purified modality-specific feature $\widetilde{\boldsymbol{F}}_{sp}$. Subsequently, this implicit knowledge is distilled into the modality-shared feature through TGSA and CSA. Finally, the $\mathcal{L}_{mdr}$ is further proposed to minimize modality discrepancy within the enhanced modality-shared feature.
  • Figure 3: Illustration of the proposed TGSA loss: 'a' and 'b' denote two different types of features. '$\boldsymbol{A}$' represents the graph structure affinity matrix. After aligning the three affinity matrices, the discrepancy in the graph structure distribution between features 'a' and features 'b' will be eliminated.
  • Figure 4: Observation the implicit discriminative information by Grad-CAM. And 'sh' and 'sp(im)' present the modality-shared feature and the modality-specific feature of trained IDKL w/o knowledge distillation, respectively; 'enhanced' denotes the modality-shared feature of IDKL w/ knowledge distillation.
  • Figure 5: Ablation analysis of hyper-parameter $\lambda_1$ and $\lambda_2$, $\lambda_3$ for $\mathcal{L}_{ip}$, $\mathcal{L}_{tgsa}$, and $\mathcal{L}_{csa}$ respectively on SYSU-MM01 dataset.
  • ...and 1 more figures