Adaptive Illumination-Invariant Synergistic Feature Integration in a Stratified Granular Framework for Visible-Infrared Re-Identification
Yuheng Jia, Wesley Armour
TL;DR
AMINet tackles visible-infrared person re-identification by bridging RGB-IR modality gaps under varying illumination and occlusion. It introduces a Hierarchical Multi-Granular Dual-Branch Network (HMG-DBNet) with an Interactive Feature Fusion Strategy (IFFS), Phase-Enhanced Structural Attention Module (PESAM), and Adaptive Multi-Scale Kernel MMD (AMK-MMD). The approach extracts global and local features from full-body and upper-body views, fuses intra- and cross-modality cues, and aligns distributions with multi-scale kernels and phase-based illumination-invariant features. It achieves state-of-the-art results on SYSU-MM01 and RegDB, demonstrating strong cross-modal generalization and scalability for VI-ReID.
Abstract
Visible-Infrared Person Re-Identification (VI-ReID) plays a crucial role in applications such as search and rescue, infrastructure protection, and nighttime surveillance. However, it faces significant challenges due to modality discrepancies, varying illumination, and frequent occlusions. To overcome these obstacles, we propose \textbf{AMINet}, an Adaptive Modality Interaction Network. AMINet employs multi-granularity feature extraction to capture comprehensive identity attributes from both full-body and upper-body images, improving robustness against occlusions and background clutter. The model integrates an interactive feature fusion strategy for deep intra-modal and cross-modal alignment, enhancing generalization and effectively bridging the RGB-IR modality gap. Furthermore, AMINet utilizes phase congruency for robust, illumination-invariant feature extraction and incorporates an adaptive multi-scale kernel MMD to align feature distributions across varying scales. Extensive experiments on benchmark datasets demonstrate the effectiveness of our approach, achieving a Rank-1 accuracy of $74.75\%$ on SYSU-MM01, surpassing the baseline by $7.93\%$ and outperforming the current state-of-the-art by $3.95\%$.
