Table of Contents
Fetching ...

Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification

Yulin Li, Tianzhu Zhang, Yongdong Zhang

TL;DR

This work tackles visible-infrared person re-identification by identifying amplitude differences in the Fourier spectrum as the primary modality discrepancy source and proposing a frequency-domain framework, FDMNet. It introduces two novel modules: an Instance-Adaptive Amplitude Filter (IAF) to suppress modality-specific amplitude information at the image level, and a Phrase-Preserving Normalization (PPNorm) to preserve semantic phase in the feature space, enabling image- and feature-level alignment. A modality adversarial learning scheme further enforces cross-modality invariance, and the combined objective includes identity, center-cluster, and frequency-domain regularizers. Extensive experiments on SYSU-MM01 and RegDB show state-of-the-art performance and strong generalization, with the proposed modules demonstrated as effective plug-ins for existing VI-ReID methods. This frequency-domain perspective offers a new avenue for cross-modality learning with practical implications for surveillance systems across lighting conditions.

Abstract

Visible-infrared person re-identification (VI-ReID) is challenging due to the significant cross-modality discrepancies between visible and infrared images. While existing methods have focused on designing complex network architectures or using metric learning constraints to learn modality-invariant features, they often overlook which specific component of the image causes the modality discrepancy problem. In this paper, we first reveal that the difference in the amplitude component of visible and infrared images is the primary factor that causes the modality discrepancy and further propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective. Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) module and the Phrase-Preserving Normalization (PPNorm) module, to enhance the modality-invariant amplitude component and suppress the modality-specific component at both the image- and feature-levels. Extensive experimental results on two standard benchmarks, SYSU-MM01 and RegDB, demonstrate the superior performance of our FDMNet against state-of-the-art methods.

Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification

TL;DR

This work tackles visible-infrared person re-identification by identifying amplitude differences in the Fourier spectrum as the primary modality discrepancy source and proposing a frequency-domain framework, FDMNet. It introduces two novel modules: an Instance-Adaptive Amplitude Filter (IAF) to suppress modality-specific amplitude information at the image level, and a Phrase-Preserving Normalization (PPNorm) to preserve semantic phase in the feature space, enabling image- and feature-level alignment. A modality adversarial learning scheme further enforces cross-modality invariance, and the combined objective includes identity, center-cluster, and frequency-domain regularizers. Extensive experiments on SYSU-MM01 and RegDB show state-of-the-art performance and strong generalization, with the proposed modules demonstrated as effective plug-ins for existing VI-ReID methods. This frequency-domain perspective offers a new avenue for cross-modality learning with practical implications for surveillance systems across lighting conditions.

Abstract

Visible-infrared person re-identification (VI-ReID) is challenging due to the significant cross-modality discrepancies between visible and infrared images. While existing methods have focused on designing complex network architectures or using metric learning constraints to learn modality-invariant features, they often overlook which specific component of the image causes the modality discrepancy problem. In this paper, we first reveal that the difference in the amplitude component of visible and infrared images is the primary factor that causes the modality discrepancy and further propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective. Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) module and the Phrase-Preserving Normalization (PPNorm) module, to enhance the modality-invariant amplitude component and suppress the modality-specific component at both the image- and feature-levels. Extensive experimental results on two standard benchmarks, SYSU-MM01 and RegDB, demonstrate the superior performance of our FDMNet against state-of-the-art methods.
Paper Structure (16 sections, 14 equations, 4 figures, 5 tables)

This paper contains 16 sections, 14 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Examples of the amplitude-only and phase-only reconstruction. (a) (d) Original images. (b) (e) Reconstructed images with amplitude information only by setting the phase component to a constant. (c) (f) Reconstructed images with phase information only by setting the amplitude component to a constant.
  • Figure 2: Examples of images reconstructed from filtered amplitude component. (a) (e) Original images. (b) (f) Reconstructed images from the low-pass filtered amplitude component and original phase component. (c) (g) Reconstructed images from the middle-pass filtered amplitude component and original phase component. (d) (h) Reconstructed images from high-pass filtered amplitude component and original phase component. It can be observed that different frequency components of the amplitude are of different transferability across modalities.
  • Figure 3: The overall architecture of our proposed Frequency Domain Modality-invariant feature learning framework (FDMNet). Our FDMNet includes an Instance-adaptive Amplitude Filter (IAF) module and a Phrase-Preserving Normalization (PPNorm) module. The IAF and the PPNorm modules can enhance the modality-invariant amplitude component and suppress the modality-specific component at image-level and feature-level, respectively.
  • Figure 4: The detailed architecture of the proposed Phrase-Preserving Normalization (PPNorm) Module.