Table of Contents
Fetching ...

Extended Cross-Modality United Learning for Unsupervised Visible-Infrared Person Re-identification

Ruixing Wu, Yiming Yang, Jiakai He, Haifeng Hu

TL;DR

This work tackles unsupervised visible-infrared person re-identification by addressing cross-modality clustering challenges and inter-modality gaps. It proposes Extended Cross-Modality United Learning (ECUL), which fuses cluster-level and instance-level contrastive learning with cross-modal memory aggregation, and introduces two novel modules: Extended Modality-Camera Clustering (EMCC) and Two-Step Memory Updating (TSMem). EMCC refines clustering by enforcing camera- and modality-aware constraints and by using a two-tier neighborhood (k2 and k3) to filter negatives while fusing positives. TSMem updates memory in two stages to preserve diversity early in training and enhance generalization later. Experiments on SYSU-MM01 and RegDB show ECUL achieving state-of-the-art or competitive results among unsupervised methods and surpassing several supervised approaches, demonstrating annotation-free viability for cross-modality Re-ID with practical security implications.

Abstract

Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims to learn modality-invariant features from unlabeled cross-modality datasets and reduce the inter-modality gap. However, the existing methods lack cross-modality clustering or excessively pursue cluster-level association, which makes it difficult to perform reliable modality-invariant features learning. To deal with this issue, we propose a Extended Cross-Modality United Learning (ECUL) framework, incorporating Extended Modality-Camera Clustering (EMCC) and Two-Step Memory Updating Strategy (TSMem) modules. Specifically, we design ECUL to naturally integrates intra-modality clustering, inter-modality clustering and inter-modality instance selection, establishing compact and accurate cross-modality associations while reducing the introduction of noisy labels. Moreover, EMCC captures and filters the neighborhood relationships by extending the encoding vector, which further promotes the learning of modality-invariant and camera-invariant knowledge in terms of clustering algorithm. Finally, TSMem provides accurate and generalized proxy points for contrastive learning by updating the memory in stages. Extensive experiments results on SYSU-MM01 and RegDB datasets demonstrate that the proposed ECUL shows promising performance and even outperforms certain supervised methods.

Extended Cross-Modality United Learning for Unsupervised Visible-Infrared Person Re-identification

TL;DR

This work tackles unsupervised visible-infrared person re-identification by addressing cross-modality clustering challenges and inter-modality gaps. It proposes Extended Cross-Modality United Learning (ECUL), which fuses cluster-level and instance-level contrastive learning with cross-modal memory aggregation, and introduces two novel modules: Extended Modality-Camera Clustering (EMCC) and Two-Step Memory Updating (TSMem). EMCC refines clustering by enforcing camera- and modality-aware constraints and by using a two-tier neighborhood (k2 and k3) to filter negatives while fusing positives. TSMem updates memory in two stages to preserve diversity early in training and enhance generalization later. Experiments on SYSU-MM01 and RegDB show ECUL achieving state-of-the-art or competitive results among unsupervised methods and surpassing several supervised approaches, demonstrating annotation-free viability for cross-modality Re-ID with practical security implications.

Abstract

Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims to learn modality-invariant features from unlabeled cross-modality datasets and reduce the inter-modality gap. However, the existing methods lack cross-modality clustering or excessively pursue cluster-level association, which makes it difficult to perform reliable modality-invariant features learning. To deal with this issue, we propose a Extended Cross-Modality United Learning (ECUL) framework, incorporating Extended Modality-Camera Clustering (EMCC) and Two-Step Memory Updating Strategy (TSMem) modules. Specifically, we design ECUL to naturally integrates intra-modality clustering, inter-modality clustering and inter-modality instance selection, establishing compact and accurate cross-modality associations while reducing the introduction of noisy labels. Moreover, EMCC captures and filters the neighborhood relationships by extending the encoding vector, which further promotes the learning of modality-invariant and camera-invariant knowledge in terms of clustering algorithm. Finally, TSMem provides accurate and generalized proxy points for contrastive learning by updating the memory in stages. Extensive experiments results on SYSU-MM01 and RegDB datasets demonstrate that the proposed ECUL shows promising performance and even outperforms certain supervised methods.
Paper Structure (18 sections, 9 equations, 2 figures, 2 tables)

This paper contains 18 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The structure of our ECUL framework for USL-VI-ReID. The ECUL framework is based on the cluster-level and instance-level contrastive losses and it integrates cross-modal clustering and inter-modality instance selection. EMCC and TSMem are included in the framework.
  • Figure 2: Illustration of the clustering results of common clustering method and our EMCC. Different colors represent different identities, while different shapes represent different modalities or cameras. In our EMCC, $k_2$ provides a small area range and $k_3$ provides a large area range, thus effectively filtering out hard negative information during the clustering phase.