Table of Contents
Fetching ...

Prototype-Driven Multi-Feature Generation for Visible-Infrared Person Re-identification

Jiarui Li, Zhen Qiu, Yilin Yang, Yuqi Li, Zeyu Dong, Chuanguang Yang

TL;DR

This work tackles cross-modal person re-identification between visible and infrared images by addressing modality gaps and misalignment due to pose and viewpoint. It introduces Prototype-Driven Multi-Feature Generation (PDM), comprising a Multi-Feature Generation Module (MFGM) that produces diverse yet distribution-consistent features via center-guided mining, and a Prototype Learning Module (PLM) that leverages learnable prototypes to reveal semantically similar local cross-modal features. The approach is reinforced with a center-guided pair mining loss $L_{cpm}$, cosine heterogeneity loss $L_{ch}$ to diversify prototypes, and a dual-center separation loss $L_{dcs}$ to sharpen discrimination, integrated as $\mathcal{L}_{total}=\mathcal{L}_{id}+\mathcal{L}_{plm}+\mathcal{L}_{cpm}$ with $\mathcal{L}_{plm}=\mathcal{L}_{tri}+\mathcal{L}_{ch}+\mathcal{L}_{dcs}$. Extensive experiments on SYSU-MM01 and LLCM demonstrate state-of-the-art performance in cross-modal VI-ReID, validating the effectiveness of generating closely distributed diverse features and mining latent cross-modal semantics for robust instance-level alignment. The work provides code for reproduction and sets a benchmark for prototype-driven cross-modal feature learning in surveillance contexts.

Abstract

The primary challenges in visible-infrared person re-identification arise from the differences between visible (vis) and infrared (ir) images, including inter-modal and intra-modal variations. These challenges are further complicated by varying viewpoints and irregular movements. Existing methods often rely on horizontal partitioning to align part-level features, which can introduce inaccuracies and have limited effectiveness in reducing modality discrepancies. In this paper, we propose a novel Prototype-Driven Multi-feature generation framework (PDM) aimed at mitigating cross-modal discrepancies by constructing diversified features and mining latent semantically similar features for modal alignment. PDM comprises two key components: Multi-Feature Generation Module (MFGM) and Prototype Learning Module (PLM). The MFGM generates diversity features closely distributed from modality-shared features to represent pedestrians. Additionally, the PLM utilizes learnable prototypes to excavate latent semantic similarities among local features between visible and infrared modalities, thereby facilitating cross-modal instance-level alignment. We introduce the cosine heterogeneity loss to enhance prototype diversity for extracting rich local features. Extensive experiments conducted on the SYSU-MM01 and LLCM datasets demonstrate that our approach achieves state-of-the-art performance. Our codes are available at https://github.com/mmunhappy/ICASSP2025-PDM.

Prototype-Driven Multi-Feature Generation for Visible-Infrared Person Re-identification

TL;DR

This work tackles cross-modal person re-identification between visible and infrared images by addressing modality gaps and misalignment due to pose and viewpoint. It introduces Prototype-Driven Multi-Feature Generation (PDM), comprising a Multi-Feature Generation Module (MFGM) that produces diverse yet distribution-consistent features via center-guided mining, and a Prototype Learning Module (PLM) that leverages learnable prototypes to reveal semantically similar local cross-modal features. The approach is reinforced with a center-guided pair mining loss , cosine heterogeneity loss to diversify prototypes, and a dual-center separation loss to sharpen discrimination, integrated as with . Extensive experiments on SYSU-MM01 and LLCM demonstrate state-of-the-art performance in cross-modal VI-ReID, validating the effectiveness of generating closely distributed diverse features and mining latent cross-modal semantics for robust instance-level alignment. The work provides code for reproduction and sets a benchmark for prototype-driven cross-modal feature learning in surveillance contexts.

Abstract

The primary challenges in visible-infrared person re-identification arise from the differences between visible (vis) and infrared (ir) images, including inter-modal and intra-modal variations. These challenges are further complicated by varying viewpoints and irregular movements. Existing methods often rely on horizontal partitioning to align part-level features, which can introduce inaccuracies and have limited effectiveness in reducing modality discrepancies. In this paper, we propose a novel Prototype-Driven Multi-feature generation framework (PDM) aimed at mitigating cross-modal discrepancies by constructing diversified features and mining latent semantically similar features for modal alignment. PDM comprises two key components: Multi-Feature Generation Module (MFGM) and Prototype Learning Module (PLM). The MFGM generates diversity features closely distributed from modality-shared features to represent pedestrians. Additionally, the PLM utilizes learnable prototypes to excavate latent semantic similarities among local features between visible and infrared modalities, thereby facilitating cross-modal instance-level alignment. We introduce the cosine heterogeneity loss to enhance prototype diversity for extracting rich local features. Extensive experiments conducted on the SYSU-MM01 and LLCM datasets demonstrate that our approach achieves state-of-the-art performance. Our codes are available at https://github.com/mmunhappy/ICASSP2025-PDM.
Paper Structure (13 sections, 12 equations, 3 figures, 3 tables)

This paper contains 13 sections, 12 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The Framework of PDM.
  • Figure 2: (a-d) illustrate the intra-class and inter-class distances of cross-modality features, with intra-class and inter-class distances represented in blue and green, respectively. In (e-h), the t-SNE van2008visualizing visualizations illustrate the 2D feature distributions, where circles and triangles denote infrared and visible modalities, and different colors represent pedestrians from distinct categories.
  • Figure 3: The visualization results of attention maps. (a) represents the displayed image, (b) and (c) show the results of baseline and PDM.