Prototype-Driven Multi-Feature Generation for Visible-Infrared Person Re-identification
Jiarui Li, Zhen Qiu, Yilin Yang, Yuqi Li, Zeyu Dong, Chuanguang Yang
TL;DR
This work tackles cross-modal person re-identification between visible and infrared images by addressing modality gaps and misalignment due to pose and viewpoint. It introduces Prototype-Driven Multi-Feature Generation (PDM), comprising a Multi-Feature Generation Module (MFGM) that produces diverse yet distribution-consistent features via center-guided mining, and a Prototype Learning Module (PLM) that leverages learnable prototypes to reveal semantically similar local cross-modal features. The approach is reinforced with a center-guided pair mining loss $L_{cpm}$, cosine heterogeneity loss $L_{ch}$ to diversify prototypes, and a dual-center separation loss $L_{dcs}$ to sharpen discrimination, integrated as $\mathcal{L}_{total}=\mathcal{L}_{id}+\mathcal{L}_{plm}+\mathcal{L}_{cpm}$ with $\mathcal{L}_{plm}=\mathcal{L}_{tri}+\mathcal{L}_{ch}+\mathcal{L}_{dcs}$. Extensive experiments on SYSU-MM01 and LLCM demonstrate state-of-the-art performance in cross-modal VI-ReID, validating the effectiveness of generating closely distributed diverse features and mining latent cross-modal semantics for robust instance-level alignment. The work provides code for reproduction and sets a benchmark for prototype-driven cross-modal feature learning in surveillance contexts.
Abstract
The primary challenges in visible-infrared person re-identification arise from the differences between visible (vis) and infrared (ir) images, including inter-modal and intra-modal variations. These challenges are further complicated by varying viewpoints and irregular movements. Existing methods often rely on horizontal partitioning to align part-level features, which can introduce inaccuracies and have limited effectiveness in reducing modality discrepancies. In this paper, we propose a novel Prototype-Driven Multi-feature generation framework (PDM) aimed at mitigating cross-modal discrepancies by constructing diversified features and mining latent semantically similar features for modal alignment. PDM comprises two key components: Multi-Feature Generation Module (MFGM) and Prototype Learning Module (PLM). The MFGM generates diversity features closely distributed from modality-shared features to represent pedestrians. Additionally, the PLM utilizes learnable prototypes to excavate latent semantic similarities among local features between visible and infrared modalities, thereby facilitating cross-modal instance-level alignment. We introduce the cosine heterogeneity loss to enhance prototype diversity for extracting rich local features. Extensive experiments conducted on the SYSU-MM01 and LLCM datasets demonstrate that our approach achieves state-of-the-art performance. Our codes are available at https://github.com/mmunhappy/ICASSP2025-PDM.
