Table of Contents
Fetching ...

Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification

Mahdi Alehdaghi, Arthur Josi, Pourya Shamsolmoali, Rafael M. O. Cruz, Eric Granger

TL;DR

This work tackles the challenging cross-modal visible–infrared (V–I) person re-identification problem by introducing AGPI$^2$, a training-time framework that generates a privileged intermediate domain $Z$ to bridge V and I distributions. The approach combines a generator, an ID-modality discriminator, and a feature-embedding backbone under the Learning Under Privileged Information (LUPI) paradigm, guided by mutual-information objectives to maximize identity-relevant information in $Z$ while suppressing modality cues. Key contributions include an adversarially trained intermediate domain, an ID-aware discriminator to focus on identity features, and a dual-triplet/color-free loss regime that yields modality-invariant representations; AGPI$^2$ achieves state-of-the-art results on SYSU-MM01 and RegDB without increasing inference cost. The method also proves compatible with other VI-ReID models, offering substantial performance gains with minimal computational overhead at test time, highlighting its practical impact for robust cross-modal person re-identification. Overall, AGPI$^2$ provides a principled, efficient path to bridging large modality gaps and can be extended to broader cross-modal retrieval tasks.

Abstract

Visible-infrared person re-identification seeks to retrieve images of the same individual captured over a distributed network of RGB and IR sensors. Several V-I ReID approaches directly integrate both V and I modalities to discriminate persons within a shared representation space. However, given the significant gap in data distributions between V and I modalities, cross-modal V-I ReID remains challenging. Some recent approaches improve generalization by leveraging intermediate spaces that can bridge V and I modalities, yet effective methods are required to select or generate data for such informative domains. In this paper, the Adaptive Generation of Privileged Intermediate Information training approach is introduced to adapt and generate a virtual domain that bridges discriminant information between the V and I modalities. The key motivation behind AGPI^2 is to enhance the training of a deep V-I ReID backbone by generating privileged images that provide additional information. These privileged images capture shared discriminative features that are not easily accessible within the original V or I modalities alone. Towards this goal, a non-linear generative module is trained with an adversarial objective, translating V images into intermediate spaces with a smaller domain shift w.r.t. the I domain. Meanwhile, the embedding module within AGPI^2 aims to produce similar features for both V and generated images, encouraging the extraction of features that are common to all modalities. In addition to these contributions, AGPI^2 employs adversarial objectives for adapting the intermediate images, which play a crucial role in creating a non-modality-specific space to address the large domain shifts between V and I domains. Experimental results conducted on challenging V-I ReID datasets indicate that AGPI^2 increases matching accuracy without extra computational resources during inference.

Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification

TL;DR

This work tackles the challenging cross-modal visible–infrared (V–I) person re-identification problem by introducing AGPI, a training-time framework that generates a privileged intermediate domain to bridge V and I distributions. The approach combines a generator, an ID-modality discriminator, and a feature-embedding backbone under the Learning Under Privileged Information (LUPI) paradigm, guided by mutual-information objectives to maximize identity-relevant information in while suppressing modality cues. Key contributions include an adversarially trained intermediate domain, an ID-aware discriminator to focus on identity features, and a dual-triplet/color-free loss regime that yields modality-invariant representations; AGPI achieves state-of-the-art results on SYSU-MM01 and RegDB without increasing inference cost. The method also proves compatible with other VI-ReID models, offering substantial performance gains with minimal computational overhead at test time, highlighting its practical impact for robust cross-modal person re-identification. Overall, AGPI provides a principled, efficient path to bridging large modality gaps and can be extended to broader cross-modal retrieval tasks.

Abstract

Visible-infrared person re-identification seeks to retrieve images of the same individual captured over a distributed network of RGB and IR sensors. Several V-I ReID approaches directly integrate both V and I modalities to discriminate persons within a shared representation space. However, given the significant gap in data distributions between V and I modalities, cross-modal V-I ReID remains challenging. Some recent approaches improve generalization by leveraging intermediate spaces that can bridge V and I modalities, yet effective methods are required to select or generate data for such informative domains. In this paper, the Adaptive Generation of Privileged Intermediate Information training approach is introduced to adapt and generate a virtual domain that bridges discriminant information between the V and I modalities. The key motivation behind AGPI^2 is to enhance the training of a deep V-I ReID backbone by generating privileged images that provide additional information. These privileged images capture shared discriminative features that are not easily accessible within the original V or I modalities alone. Towards this goal, a non-linear generative module is trained with an adversarial objective, translating V images into intermediate spaces with a smaller domain shift w.r.t. the I domain. Meanwhile, the embedding module within AGPI^2 aims to produce similar features for both V and generated images, encouraging the extraction of features that are common to all modalities. In addition to these contributions, AGPI^2 employs adversarial objectives for adapting the intermediate images, which play a crucial role in creating a non-modality-specific space to address the large domain shifts between V and I domains. Experimental results conducted on challenging V-I ReID datasets indicate that AGPI^2 increases matching accuracy without extra computational resources during inference.
Paper Structure (31 sections, 1 theorem, 15 equations, 10 figures, 15 tables, 1 algorithm)

This paper contains 31 sections, 1 theorem, 15 equations, 10 figures, 15 tables, 1 algorithm.

Key Result

Proposition 1

Let $Z$ and $Y$ be random variables with domains $\mathcal{Z}$ and $\mathcal{Y}$, respectively. Minimizing the conditional cross-entropy loss of predicted label $\hat{Y}$, denoted by $\mathcal{H}(Y; \hat{Y}|Z)$, is equivalent to maximizing the $\text{MI}(Z; Y)$

Figures (10)

  • Figure 1: A illustration of the training strategy with our AGPI$^2$ approach. The generated images are learned to be similar to the infrared modality by using adversarial objectives. (a) The feature embedding stage pushes the extracted features to approach the intermediate domain, while (b) the generation stage transforms V images to an intermediate domain that approaches I images. The objective of feature embedding is to minimize the distance from the anchor to the positive ($D_{a,p}$) and maximize its distance from the negative samples ($D_{a,n}$). Let $c^v$, $c^z$, and $c^i$ be the center of the feature distributions for each class of V, intermediate, and I images. The generator tries to create a space in which the ID-aware features of each person are close to the infrared and visible at the same time. (To reach this goal and make the transferred images not be biased toward visible, we push them more toward infrared: $D_{c^z, c^i} < D_{c^z, c^v} + M_1$).
  • Figure 2: Block diagram of our proposed AGPI$^2$ training strategy for V-I person ReID. It takes advantage of the adaptive intermediate domain to reduce the distribution gap between the V and I domains. The feature embedding module seeks to minimize the distance between the two original (I and V) modalities by using the auxiliary generated intermediate domain. The ID-modality discriminator learns to detect the modality from ID-aware features, and the generation process aims to transform V images to infrared through adversarial training with the discriminator.
  • Figure 3: ID-Modality discriminator vs. general discriminator. Our label space is doubled to account for identity (differentiate between individuals in each modality).
  • Figure 4: Examples of images generated with our AGPI$^2$ for SYSU-MM01 dataset. Columns (a) and (b) are V and I images. The intermediate and reconstructed I images are shown in (c) and (d), respectively. The GradCAMgradCAM of our ID-Modality vs binary discriminator is shown in columns (e) to (h). Columns (e) and (f) for AGPI$^2$ discriminator for V and I images, respectively, and (g),(h) for Binary discriminator.
  • Figure 5: Distribution of between- and within-class distances on the SYSU-MM01 dataset for (a,c) train and (b,d) test sets with the baseline modelall-survey and our AGPI$^2$ model.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proof 1