Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification
Yulin Li, Tianzhu Zhang, Yongdong Zhang
TL;DR
This work tackles visible-infrared person re-identification by identifying amplitude differences in the Fourier spectrum as the primary modality discrepancy source and proposing a frequency-domain framework, FDMNet. It introduces two novel modules: an Instance-Adaptive Amplitude Filter (IAF) to suppress modality-specific amplitude information at the image level, and a Phrase-Preserving Normalization (PPNorm) to preserve semantic phase in the feature space, enabling image- and feature-level alignment. A modality adversarial learning scheme further enforces cross-modality invariance, and the combined objective includes identity, center-cluster, and frequency-domain regularizers. Extensive experiments on SYSU-MM01 and RegDB show state-of-the-art performance and strong generalization, with the proposed modules demonstrated as effective plug-ins for existing VI-ReID methods. This frequency-domain perspective offers a new avenue for cross-modality learning with practical implications for surveillance systems across lighting conditions.
Abstract
Visible-infrared person re-identification (VI-ReID) is challenging due to the significant cross-modality discrepancies between visible and infrared images. While existing methods have focused on designing complex network architectures or using metric learning constraints to learn modality-invariant features, they often overlook which specific component of the image causes the modality discrepancy problem. In this paper, we first reveal that the difference in the amplitude component of visible and infrared images is the primary factor that causes the modality discrepancy and further propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective. Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) module and the Phrase-Preserving Normalization (PPNorm) module, to enhance the modality-invariant amplitude component and suppress the modality-specific component at both the image- and feature-levels. Extensive experimental results on two standard benchmarks, SYSU-MM01 and RegDB, demonstrate the superior performance of our FDMNet against state-of-the-art methods.
