Table of Contents
Fetching ...

Adaptive Illumination-Invariant Synergistic Feature Integration in a Stratified Granular Framework for Visible-Infrared Re-Identification

Yuheng Jia, Wesley Armour

TL;DR

AMINet tackles visible-infrared person re-identification by bridging RGB-IR modality gaps under varying illumination and occlusion. It introduces a Hierarchical Multi-Granular Dual-Branch Network (HMG-DBNet) with an Interactive Feature Fusion Strategy (IFFS), Phase-Enhanced Structural Attention Module (PESAM), and Adaptive Multi-Scale Kernel MMD (AMK-MMD). The approach extracts global and local features from full-body and upper-body views, fuses intra- and cross-modality cues, and aligns distributions with multi-scale kernels and phase-based illumination-invariant features. It achieves state-of-the-art results on SYSU-MM01 and RegDB, demonstrating strong cross-modal generalization and scalability for VI-ReID.

Abstract

Visible-Infrared Person Re-Identification (VI-ReID) plays a crucial role in applications such as search and rescue, infrastructure protection, and nighttime surveillance. However, it faces significant challenges due to modality discrepancies, varying illumination, and frequent occlusions. To overcome these obstacles, we propose \textbf{AMINet}, an Adaptive Modality Interaction Network. AMINet employs multi-granularity feature extraction to capture comprehensive identity attributes from both full-body and upper-body images, improving robustness against occlusions and background clutter. The model integrates an interactive feature fusion strategy for deep intra-modal and cross-modal alignment, enhancing generalization and effectively bridging the RGB-IR modality gap. Furthermore, AMINet utilizes phase congruency for robust, illumination-invariant feature extraction and incorporates an adaptive multi-scale kernel MMD to align feature distributions across varying scales. Extensive experiments on benchmark datasets demonstrate the effectiveness of our approach, achieving a Rank-1 accuracy of $74.75\%$ on SYSU-MM01, surpassing the baseline by $7.93\%$ and outperforming the current state-of-the-art by $3.95\%$.

Adaptive Illumination-Invariant Synergistic Feature Integration in a Stratified Granular Framework for Visible-Infrared Re-Identification

TL;DR

AMINet tackles visible-infrared person re-identification by bridging RGB-IR modality gaps under varying illumination and occlusion. It introduces a Hierarchical Multi-Granular Dual-Branch Network (HMG-DBNet) with an Interactive Feature Fusion Strategy (IFFS), Phase-Enhanced Structural Attention Module (PESAM), and Adaptive Multi-Scale Kernel MMD (AMK-MMD). The approach extracts global and local features from full-body and upper-body views, fuses intra- and cross-modality cues, and aligns distributions with multi-scale kernels and phase-based illumination-invariant features. It achieves state-of-the-art results on SYSU-MM01 and RegDB, demonstrating strong cross-modal generalization and scalability for VI-ReID.

Abstract

Visible-Infrared Person Re-Identification (VI-ReID) plays a crucial role in applications such as search and rescue, infrastructure protection, and nighttime surveillance. However, it faces significant challenges due to modality discrepancies, varying illumination, and frequent occlusions. To overcome these obstacles, we propose \textbf{AMINet}, an Adaptive Modality Interaction Network. AMINet employs multi-granularity feature extraction to capture comprehensive identity attributes from both full-body and upper-body images, improving robustness against occlusions and background clutter. The model integrates an interactive feature fusion strategy for deep intra-modal and cross-modal alignment, enhancing generalization and effectively bridging the RGB-IR modality gap. Furthermore, AMINet utilizes phase congruency for robust, illumination-invariant feature extraction and incorporates an adaptive multi-scale kernel MMD to align feature distributions across varying scales. Extensive experiments on benchmark datasets demonstrate the effectiveness of our approach, achieving a Rank-1 accuracy of on SYSU-MM01, surpassing the baseline by and outperforming the current state-of-the-art by .

Paper Structure

This paper contains 15 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The proposed framework employs an Interactive Feature Fusion Strategy(IFFS) within a dual-stream network (HMG-DBNet) to capture full-body and partial-body features effectively. PESAM utilizes phase congruency and edge-guided attention for robust RGB-IR alignment. AMK-MMD adaptively aligns feature distributions to reduce modality discrepancies. The multi-branch design, supervised by ID, MMD and Triplet losses, enhances the overall accuracy of cross-modality person re-identification.
  • Figure 2: Impact of Upper Body Proportion (UBP) on Model Accuracy for SYSU (left) and RegDB (right) datasets.
  • Figure 3: Rank-1 and mAP results on SYSU and RegDB datasets.
  • Figure 4: t-SNE Visualizations and Feature Distance Distributions for Different Models, Demonstrating Cross-Modality Feature Alignment and Intra/Inter-Class Separation.