Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

Kaijie Ren; Lei Zhang

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

Kaijie Ren, Lei Zhang

TL;DR

This work tackles cross-modal VI-ReID by leveraging implicit discriminative information hidden in modality-specific features to boost modality-shared representations. It introduces a dual-stream network with an IN-guided Information Purifier to preserve identity cues while reducing style discrepancies, and distills implicit knowledge into the shared space through Triplet Graph Structure Alignment and Class Semantic Alignment, aided by a Modality Discrepancy Reduction loss. The approach yields state-of-the-art results on SYSU-MM01 and strong performance on RegDB and LLCM, demonstrating the effectiveness of utilizing modality-specific cues rather than discarding them. The combination of feature- and logit-level distillation, along with robust purification, offers a practical path to more accurate cross-modal person re-identification.

Abstract

Visible-Infrared Person Re-identification (VI-ReID) is a challenging cross-modal pedestrian retrieval task, due to significant intra-class variations and cross-modal discrepancies among different cameras. Existing works mainly focus on embedding images of different modalities into a unified space to mine modality-shared features. They only seek distinctive information within these shared features, while ignoring the identity-aware useful information that is implicit in the modality-specific features. To address this issue, we propose a novel Implicit Discriminative Knowledge Learning (IDKL) network to uncover and leverage the implicit discriminative information contained within the modality-specific. First, we extract modality-specific and modality-shared features using a novel dual-stream network. Then, the modality-specific features undergo purification to reduce their modality style discrepancies while preserving identity-aware discriminative knowledge. Subsequently, this kind of implicit knowledge is distilled into the modality-shared feature to enhance its distinctiveness. Finally, an alignment loss is proposed to minimize modality discrepancy on enhanced modality-shared features. Extensive experiments on multiple public datasets demonstrate the superiority of IDKL network over the state-of-the-art methods. Code is available at https://github.com/1KK077/IDKL.

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

TL;DR

Abstract

Paper Structure (16 sections, 21 equations, 6 figures, 4 tables)

This paper contains 16 sections, 21 equations, 6 figures, 4 tables.

Introduction
Related Work
Methodology
Modality Confuser and Discriminator
Information Purifier
Implicit Knowledge Distillation
Triplet Graph Structure Alignment (TGSA)
Class Sementic Alignment (CSA)
Modality Discrepancy Reduction (MDR)
Optimization
Experiments
Datasets and Experimental Settings
Comparison with State-of-the-art Methods
Ablation Study
Visualization Analysis
...and 1 more sections

Figures (6)

Figure 1: Previous methods focused on seeking discriminative information within modality-shared features, overlooking the fact that there are discriminative clues implicit in modality-specific features. It is worth considering utilization of the implicit discriminative information to enhance shared invariant feature.
Figure 2: Framework of the proposed Implicit Discriminative Knowledge Learning (IDKL) model. The dual one-stream network built by resnet blocks first extracts the modality-specific $\boldsymbol{F}_{sp}$ and modality shared feature $\boldsymbol{F}_{sh}$ under the constraint of modality discriminator and modality confuser accordingly, while the common ReID loss is used to optimize network basely. Then, the modality-specific feature is fed into the information purifier to regulate the modality style discrepancy while preserving the implicit discriminative information and obtain the purified modality-specific feature $\widetilde{\boldsymbol{F}}_{sp}$. Subsequently, this implicit knowledge is distilled into the modality-shared feature through TGSA and CSA. Finally, the $\mathcal{L}_{mdr}$ is further proposed to minimize modality discrepancy within the enhanced modality-shared feature.
Figure 3: Illustration of the proposed TGSA loss: 'a' and 'b' denote two different types of features. '$\boldsymbol{A}$' represents the graph structure affinity matrix. After aligning the three affinity matrices, the discrepancy in the graph structure distribution between features 'a' and features 'b' will be eliminated.
Figure 4: Observation the implicit discriminative information by Grad-CAM. And 'sh' and 'sp(im)' present the modality-shared feature and the modality-specific feature of trained IDKL w/o knowledge distillation, respectively; 'enhanced' denotes the modality-shared feature of IDKL w/ knowledge distillation.
Figure 5: Ablation analysis of hyper-parameter $\lambda_1$ and $\lambda_2$, $\lambda_3$ for $\mathcal{L}_{ip}$, $\mathcal{L}_{tgsa}$, and $\mathcal{L}_{csa}$ respectively on SYSU-MM01 dataset.
...and 1 more figures

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

TL;DR

Abstract

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)