Table of Contents
Fetching ...

Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification

Mahdi Alehdaghi, Pourya Shamsolmoali, Rafael M. O. Cruz, Eric Granger

TL;DR

This work addresses cross-modal visible-infrared person re-identification by learning modality-invariant embeddings for RGB and IR images. It proposes Bidirectional Multi-Step Domain Generalization (BMDG), which jointly learns discriminative body-part prototypes (across $K$ parts) and progressively builds multiple intermediate feature spaces via a bidirectional prototype-mixing strategy $\mathcal{G}$, enhanced by Attentive Prototype Embedding (APE). A prototype-alignment module uses hierarchical contrastive learning to ensure prototypes are complementary, interchangeable, and ID-discriminative, while the bidirectional learning refines the embedding through $T$ intermediate steps to bridge the modality gap without bias toward a single modality. Empirically, BMDG achieves state-of-the-art results on SYSU-MM01, RegDB, and LLCM datasets and can be integrated into other part-based V-I ReID methods, offering improved robustness for cross-modal retrieval in practical surveillance settings.

Abstract

A key challenge in visible-infrared person re-identification (V-I ReID) is training a backbone model capable of effectively addressing the significant discrepancies across modalities. State-of-the-art methods that generate a single intermediate bridging domain are often less effective, as this generated domain may not adequately capture sufficient common discriminant information. This paper introduces Bidirectional Multi-step Domain Generalization (BMDG), a novel approach for unifying feature representations across diverse modalities. BMDG creates multiple virtual intermediate domains by learning and aligning body part features extracted from both I and V modalities. In particular, our method aims to minimize the cross-modal gap in two steps. First, BMDG aligns modalities in the feature space by learning shared and modality-invariant body part prototypes from V and I images. Then, it generalizes the feature representation by applying bidirectional multi-step learning, which progressively refines feature representations in each step and incorporates more prototypes from both modalities. Based on these prototypes, multiple bridging steps enhance the feature representation. Experiments conducted on V-I ReID datasets indicate that our BMDG approach can outperform state-of-the-art part-based and intermediate generation methods, and can be integrated into other part-based methods to enhance their V-I ReID performance. (Our code is available at:https:/alehdaghi.github.io/BMDG/ )

Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification

TL;DR

This work addresses cross-modal visible-infrared person re-identification by learning modality-invariant embeddings for RGB and IR images. It proposes Bidirectional Multi-Step Domain Generalization (BMDG), which jointly learns discriminative body-part prototypes (across parts) and progressively builds multiple intermediate feature spaces via a bidirectional prototype-mixing strategy , enhanced by Attentive Prototype Embedding (APE). A prototype-alignment module uses hierarchical contrastive learning to ensure prototypes are complementary, interchangeable, and ID-discriminative, while the bidirectional learning refines the embedding through intermediate steps to bridge the modality gap without bias toward a single modality. Empirically, BMDG achieves state-of-the-art results on SYSU-MM01, RegDB, and LLCM datasets and can be integrated into other part-based V-I ReID methods, offering improved robustness for cross-modal retrieval in practical surveillance settings.

Abstract

A key challenge in visible-infrared person re-identification (V-I ReID) is training a backbone model capable of effectively addressing the significant discrepancies across modalities. State-of-the-art methods that generate a single intermediate bridging domain are often less effective, as this generated domain may not adequately capture sufficient common discriminant information. This paper introduces Bidirectional Multi-step Domain Generalization (BMDG), a novel approach for unifying feature representations across diverse modalities. BMDG creates multiple virtual intermediate domains by learning and aligning body part features extracted from both I and V modalities. In particular, our method aims to minimize the cross-modal gap in two steps. First, BMDG aligns modalities in the feature space by learning shared and modality-invariant body part prototypes from V and I images. Then, it generalizes the feature representation by applying bidirectional multi-step learning, which progressively refines feature representations in each step and incorporates more prototypes from both modalities. Based on these prototypes, multiple bridging steps enhance the feature representation. Experiments conducted on V-I ReID datasets indicate that our BMDG approach can outperform state-of-the-art part-based and intermediate generation methods, and can be integrated into other part-based methods to enhance their V-I ReID performance. (Our code is available at:https:/alehdaghi.github.io/BMDG/ )
Paper Structure (29 sections, 2 theorems, 37 equations, 10 figures, 8 tables, 1 algorithm)

This paper contains 29 sections, 2 theorems, 37 equations, 10 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Let $P^1,\dots, P^K$ and $Y$ be random variables with domains $\mathcal{P}^1,\dots,\mathcal{P}^K$ and $\mathcal{Y}$, respectively. Let every pair $P^k$ and $P^q$ ($k \neq q$) be independent. Then, maximizing $\text{MI}(P^1,\dots,P^K;Y)$ can be approximated by maximizing the sum of MI between each of

Figures (10)

  • Figure 1: A comparison of training architectures for V-I ReID. Approaches based on (a) a global features representation, and (b) a local part-based representation to preserve locality and global features. (c) Generation of an intermediate bridging domain to guide training. (d) Our BMGD extracts and combines prototypes from modalities in each step to gradually create multiple intermediate bridging domains.
  • Figure 2: (a) Overall BMDG training architecture comprises two parts. The prototype alignment module (left) extracts body part prototype representations from V and I images. The bidirectional multi-step learning module (right) extracts discriminant features using multiple intermediate domains created by mixing prototype information. (b) Prototype discovery (PD) architecture mines prototypes from spatial features, and Hierarchical contrastive learning (HCL) encourages the prototypes to focus on similar semantics for all individuals without losing ID-discriminative information.
  • Figure A.1: Venn diagram of theoretic measures for three variables $X$, $Y$, and $Z$, represented by the lower left, upper, and lower right circles, respectively.
  • Figure B.1: Attentive prototype embedding (APE) architecture.
  • Figure D.1: Accuracy of the proposed BMDG over $\lambda_\textit{f}$, $\lambda_\textit{v}$, $\lambda_\textit{i}$, and $\lambda_\textit{p}$ values on SYSU-MM01 dataset in all-search and single-shot mode.
  • ...and 5 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Proof 1
  • Proposition 1
  • Proof 2