Table of Contents
Fetching ...

CMCC-ReID: Cross-Modality Clothing-Change Person Re-Identification

Haoxuan Xu, Hanzi Wang, Guanglin Niu

Abstract

Person Re-Identification (ReID) faces severe challenges from modality discrepancy and clothing variation in long-term surveillance scenario. While existing studies have made significant progress in either Visible-Infrared ReID (VI-ReID) or Clothing-Change ReID (CC-ReID), real-world surveillance system often face both challenges simultaneously. To address this overlooked yet realistic problem, we define a new task, termed Cross-Modality Clothing-Change Re-Identification (CMCC-ReID), which targets pedestrian matching across variations in both modality and clothing. To advance research in this direction, we construct a new benchmark SYSU-CMCC, where each identity is captured in both visible and infrared domains with distinct outfits, reflecting the dual heterogeneity of long-term surveillance. To tackle CMCC-ReID, we propose a Progressive Identity Alignment Network (PIA) that progressively mitigates the issues of clothing variation and modality discrepancy. Specifically, a Dual-Branch Disentangling Learning (DBDL) module separates identity-related cues from clothing-related factors to achieve clothing-agnostic representation, and a Bi-Directional Prototype Learning (BPL) module performs intra-modality and inter-modality contrast in the embedding space to bridge the modality gap while further suppressing clothing interference. Extensive experiments on the SYSU-CMCC dataset demonstrate that PIA establishes a strong baseline for this new task and significantly outperforms existing methods.

CMCC-ReID: Cross-Modality Clothing-Change Person Re-Identification

Abstract

Person Re-Identification (ReID) faces severe challenges from modality discrepancy and clothing variation in long-term surveillance scenario. While existing studies have made significant progress in either Visible-Infrared ReID (VI-ReID) or Clothing-Change ReID (CC-ReID), real-world surveillance system often face both challenges simultaneously. To address this overlooked yet realistic problem, we define a new task, termed Cross-Modality Clothing-Change Re-Identification (CMCC-ReID), which targets pedestrian matching across variations in both modality and clothing. To advance research in this direction, we construct a new benchmark SYSU-CMCC, where each identity is captured in both visible and infrared domains with distinct outfits, reflecting the dual heterogeneity of long-term surveillance. To tackle CMCC-ReID, we propose a Progressive Identity Alignment Network (PIA) that progressively mitigates the issues of clothing variation and modality discrepancy. Specifically, a Dual-Branch Disentangling Learning (DBDL) module separates identity-related cues from clothing-related factors to achieve clothing-agnostic representation, and a Bi-Directional Prototype Learning (BPL) module performs intra-modality and inter-modality contrast in the embedding space to bridge the modality gap while further suppressing clothing interference. Extensive experiments on the SYSU-CMCC dataset demonstrate that PIA establishes a strong baseline for this new task and significantly outperforms existing methods.

Paper Structure

This paper contains 13 sections, 15 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Motivation of CMCC-ReID. In the long-term surveillance scenarios, two critical issues (i.e., modality discrepancy and clothing variation) are not mutually exclusive but coexist. This observation motivates the CMCC-ReID task, which targets pedestrian matching across both modality and clothing changes.
  • Figure 2: Illustration of the key issues in CMCC-ReID. (a) In the visible modality, the high contrast enables CAL to produce a heatmap with suppressed clothing cues. (b) In the infrared modality, low contrast causes CAL to generate a heatmap dominated by clothing information. (c) Comparison between SAAI and its ResNet-50 baseline shows that direct modality alignment may even degrade performance under clothing variation. (d) Integrating SAAI into CAL yields moderate improvements once clothing interference is mitigated.
  • Figure 3: Overview of PIA, which consists of two key components: a Dual-Branch Disentanglement Learning (DBDL) module for clothing-invariant feature extraction in both modalities, and a Bi-Directional Prototype Learning (BPL) module for cross-modality identity alignment. Through progressive optimization, PIA achieves robust and modality-consistent identity representation.
  • Figure 4: (a) Results of parameter sensitivity analysis for key hyperparameters in our model. (b) Visualization of retrieval results.
  • Figure 5: Visualizations of (a) cosine distance distributions and (b) attention maps.