Table of Contents
Fetching ...

Mix-Modality Person Re-Identification: A New and Practical Paradigm

Wei Liu, Xin Xu, Hua Chang, Xin Yuan, Zheng Wang

TL;DR

This work addresses the gap in cross-modality person re-identification by introducing Mix-Modality Re-Identification (MM-ReID), where both query and gallery contain mixed visible and infrared images. It proposes two technical solutions, Cross-Identity Discrimination Harmonization Loss (CIDHL) and Modality Bridge Similarity Optimization Strategy (MBSOS), to mitigate modality confusion and refine cross-modality distances using identity centers and bridge samples, respectively, within a hyperspherical feature space. The approach is validated on RegDB, SYSU-MM01, and LLCM, showing consistent improvements over state-of-the-art VI-ReID methods under mixed-modality testing. The proposed paradigm and methods offer a practical pathway to robust cross-modality retrieval in real-world surveillance, with potential for deployment and further refinement in mixed-modality settings.

Abstract

Current visible-infrared cross-modality person re-identification research has only focused on exploring the bi-modality mutual retrieval paradigm, and we propose a new and more practical mix-modality retrieval paradigm. Existing Visible-Infrared person re-identification (VI-ReID) methods have achieved some results in the bi-modality mutual retrieval paradigm by learning the correspondence between visible and infrared modalities. However, significant performance degradation occurs due to the modality confusion problem when these methods are applied to the new mix-modality paradigm. Therefore, this paper proposes a Mix-Modality person re-identification (MM-ReID) task, explores the influence of modality mixing ratio on performance, and constructs mix-modality test sets for existing datasets according to the new mix-modality testing paradigm. To solve the modality confusion problem in MM-ReID, we propose a Cross-Identity Discrimination Harmonization Loss (CIDHL) adjusting the distribution of samples in the hyperspherical feature space, pulling the centers of samples with the same identity closer, and pushing away the centers of samples with different identities while aggregating samples with the same modality and the same identity. Furthermore, we propose a Modality Bridge Similarity Optimization Strategy (MBSOS) to optimize the cross-modality similarity between the query and queried samples with the help of the similar bridge sample in the gallery. Extensive experiments demonstrate that compared to the original performance of existing cross-modality methods on MM-ReID, the addition of our CIDHL and MBSOS demonstrates a general improvement.

Mix-Modality Person Re-Identification: A New and Practical Paradigm

TL;DR

This work addresses the gap in cross-modality person re-identification by introducing Mix-Modality Re-Identification (MM-ReID), where both query and gallery contain mixed visible and infrared images. It proposes two technical solutions, Cross-Identity Discrimination Harmonization Loss (CIDHL) and Modality Bridge Similarity Optimization Strategy (MBSOS), to mitigate modality confusion and refine cross-modality distances using identity centers and bridge samples, respectively, within a hyperspherical feature space. The approach is validated on RegDB, SYSU-MM01, and LLCM, showing consistent improvements over state-of-the-art VI-ReID methods under mixed-modality testing. The proposed paradigm and methods offer a practical pathway to robust cross-modality retrieval in real-world surveillance, with potential for deployment and further refinement in mixed-modality settings.

Abstract

Current visible-infrared cross-modality person re-identification research has only focused on exploring the bi-modality mutual retrieval paradigm, and we propose a new and more practical mix-modality retrieval paradigm. Existing Visible-Infrared person re-identification (VI-ReID) methods have achieved some results in the bi-modality mutual retrieval paradigm by learning the correspondence between visible and infrared modalities. However, significant performance degradation occurs due to the modality confusion problem when these methods are applied to the new mix-modality paradigm. Therefore, this paper proposes a Mix-Modality person re-identification (MM-ReID) task, explores the influence of modality mixing ratio on performance, and constructs mix-modality test sets for existing datasets according to the new mix-modality testing paradigm. To solve the modality confusion problem in MM-ReID, we propose a Cross-Identity Discrimination Harmonization Loss (CIDHL) adjusting the distribution of samples in the hyperspherical feature space, pulling the centers of samples with the same identity closer, and pushing away the centers of samples with different identities while aggregating samples with the same modality and the same identity. Furthermore, we propose a Modality Bridge Similarity Optimization Strategy (MBSOS) to optimize the cross-modality similarity between the query and queried samples with the help of the similar bridge sample in the gallery. Extensive experiments demonstrate that compared to the original performance of existing cross-modality methods on MM-ReID, the addition of our CIDHL and MBSOS demonstrates a general improvement.

Paper Structure

This paper contains 27 sections, 10 equations, 5 figures, 11 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) Existing bi-modality mutual retrieval test paradigms for VI-ReID use a visible probe image queried in an infrared image gallery or an infrared probe image queried in a visible image gallery. (b) Our proposed mix-modality testing paradigm for MM-ReID uses mix-modality probes to query in a mixed-modality gallery. (c) The unique challenge in MM-ReID is the interference of sample points with different identities in the same modality. In the figure, blue/gray represents visible/infrared modality samples, respectively, and different shapes represent different identities. It can be seen that due to more similar color and other identity-irrelevant information, the distance between samples of the same modality but different identities is closer than that between samples of the same identity with different modalities, which eventually leads to confusion during retrieval and reduces the accuracy.
  • Figure 2: An illustration of challenges in SM-ReID, VI-ReID, MM-ReID. Different geometries represent different identities, blue/gray represents visible/infrared modality samples, while green and red lines represent correct and incorrect matches. It can be seen that: (a) SM-ReID mainly faces the challenge of differences between different identities (shapes); (b) VI-ReID needs to face the challenge of modality (color) differences in addition to identity differences; and (c) MM-ReID needs to face the challenge of modality confusions (similar modalities possessing a closer proximity) in addition to identity and modality differences.
  • Figure 3: An illustration of our proposed CIDHL and MBSOS. The mix-modality data is fed into two feature extractors with shared weights to extract the features, and under the constraints of CIDHL, the distance between the centers of the cross-modality same identity samples is pulled closer together pushing away the distance between the centers of the different identity samples of the same or different modalities, while at the same time pulling together the distance between the same identity sample point and the sample center. During the testing process, the extracted features are optimized by MBSOS to get the final shortest path with the help of bridge samples in the gallery set for obtaining the optimized distance metric $\Tilde{d}_{i,j}$.
  • Figure 4: The effect of different modality mixing ratios with respect to the AGW model performance in Rank-1, mAP, and mINP metrics on the three datasets. It can be seen that a general performance degradation arises on the other metrics except for the Rank-1 metrics that rise on some of the datasets. The origin stands for the unmixed dataset, mix37 stands for the query set to the gallery set with a ratio of $3:7$ for the visible images and infrared images, and so forth.
  • Figure 5: The t-SNE Visualization of only AGW (first row) and AGW addition with our CIDHL (second row). (a) Results of mix modality. (b) Results of Visible modality. (b) Results of infrared modality. Different colors represent different identities, dots for visible modal samples, and stars for infrared modal samples.