Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification

Yulin Li; Tianzhu Zhang; Yongdong Zhang

Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification

Yulin Li, Tianzhu Zhang, Yongdong Zhang

TL;DR

This work tackles visible-infrared person re-identification by identifying amplitude differences in the Fourier spectrum as the primary modality discrepancy source and proposing a frequency-domain framework, FDMNet. It introduces two novel modules: an Instance-Adaptive Amplitude Filter (IAF) to suppress modality-specific amplitude information at the image level, and a Phrase-Preserving Normalization (PPNorm) to preserve semantic phase in the feature space, enabling image- and feature-level alignment. A modality adversarial learning scheme further enforces cross-modality invariance, and the combined objective includes identity, center-cluster, and frequency-domain regularizers. Extensive experiments on SYSU-MM01 and RegDB show state-of-the-art performance and strong generalization, with the proposed modules demonstrated as effective plug-ins for existing VI-ReID methods. This frequency-domain perspective offers a new avenue for cross-modality learning with practical implications for surveillance systems across lighting conditions.

Abstract

Visible-infrared person re-identification (VI-ReID) is challenging due to the significant cross-modality discrepancies between visible and infrared images. While existing methods have focused on designing complex network architectures or using metric learning constraints to learn modality-invariant features, they often overlook which specific component of the image causes the modality discrepancy problem. In this paper, we first reveal that the difference in the amplitude component of visible and infrared images is the primary factor that causes the modality discrepancy and further propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective. Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) module and the Phrase-Preserving Normalization (PPNorm) module, to enhance the modality-invariant amplitude component and suppress the modality-specific component at both the image- and feature-levels. Extensive experimental results on two standard benchmarks, SYSU-MM01 and RegDB, demonstrate the superior performance of our FDMNet against state-of-the-art methods.

Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification

TL;DR

Abstract

Paper Structure (16 sections, 14 equations, 4 figures, 5 tables)

This paper contains 16 sections, 14 equations, 4 figures, 5 tables.

Introduction
Related Work
Our Approach
Problem Formulation
Instance-adaptive Amplitude Filter
Phrase-Preserving Normalization
Modality Adversarial Learning
Objective Functions
Training and Inference
Experiments
Datasets and Evaluation Protocol
Implementation Details
Comparison with State-of-the-art Methods
Ablation Studies
Further Analysis
...and 1 more sections

Figures (4)

Figure 1: Examples of the amplitude-only and phase-only reconstruction. (a) (d) Original images. (b) (e) Reconstructed images with amplitude information only by setting the phase component to a constant. (c) (f) Reconstructed images with phase information only by setting the amplitude component to a constant.
Figure 2: Examples of images reconstructed from filtered amplitude component. (a) (e) Original images. (b) (f) Reconstructed images from the low-pass filtered amplitude component and original phase component. (c) (g) Reconstructed images from the middle-pass filtered amplitude component and original phase component. (d) (h) Reconstructed images from high-pass filtered amplitude component and original phase component. It can be observed that different frequency components of the amplitude are of different transferability across modalities.
Figure 3: The overall architecture of our proposed Frequency Domain Modality-invariant feature learning framework (FDMNet). Our FDMNet includes an Instance-adaptive Amplitude Filter (IAF) module and a Phrase-Preserving Normalization (PPNorm) module. The IAF and the PPNorm modules can enhance the modality-invariant amplitude component and suppress the modality-specific component at image-level and feature-level, respectively.
Figure 4: The detailed architecture of the proposed Phrase-Preserving Normalization (PPNorm) Module.

Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification

TL;DR

Abstract

Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)