Table of Contents
Fetching ...

See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification

Xiyu Han, Xian Zhong, Wenxin Huang, Xuemei Jia, Xiaohan Yu, Alex Chichung Kot

TL;DR

This work tackles cloth-changing person re-identification (CC-ReID) by introducing Semantic Contextual Integration (SCI), a prompt-learning framework built on CLIP. SCI comprises two key components: Semantic Separation Enhancement (SSE), which learns dual prompts to disentangle ID-relevant semantics from clothing factors through orthogonalization, and Semantic-Guided Interaction Module (SIM), which uses refined text features to guide visual representations via a non-local, text-informed attention mechanism. The method optimizes a joint objective that aligns image-text features while suppressing clothing-related cues, yielding state-of-the-art results on LTCC, PRCC, and VC-Clothes, with good generalization to cloth-consistent datasets like Market1501 and MSMT17. Overall, SCI demonstrates the value of cross-modal semantic integration and prompt learning for robust CC-ReID, offering a practical approach with strong empirical performance and avenues for improved prompt interpretability.

Abstract

Cloth-changing person re-identification (CC-ReID) aims to match individuals across surveillance cameras despite variations in clothing. Existing methods typically mitigate the impact of clothing changes or enhance identity (ID)-relevant features, but they often struggle to capture complex semantic information. In this paper, we propose a novel prompt learning framework Semantic Contextual Integration (SCI), which leverages the visual-textual representation capabilities of CLIP to reduce clothing-induced discrepancies and strengthen ID cues. Specifically, we introduce the Semantic Separation Enhancement (SSE) module, which employs dual learnable text tokens to disentangle clothing-related semantics from confounding factors, thereby isolating ID-relevant features. Furthermore, we develop a Semantic-Guided Interaction Module (SIM) that uses orthogonalized text features to guide visual representations, sharpening the focus of the model on distinctive ID characteristics. This semantic integration improves the discriminative power of the model and enriches the visual context with high-dimensional insights. Extensive experiments on three CC-ReID datasets demonstrate that our method outperforms state-of-the-art techniques. The code will be released at https://github.com/hxy-499/CCREID-SCI.

See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification

TL;DR

This work tackles cloth-changing person re-identification (CC-ReID) by introducing Semantic Contextual Integration (SCI), a prompt-learning framework built on CLIP. SCI comprises two key components: Semantic Separation Enhancement (SSE), which learns dual prompts to disentangle ID-relevant semantics from clothing factors through orthogonalization, and Semantic-Guided Interaction Module (SIM), which uses refined text features to guide visual representations via a non-local, text-informed attention mechanism. The method optimizes a joint objective that aligns image-text features while suppressing clothing-related cues, yielding state-of-the-art results on LTCC, PRCC, and VC-Clothes, with good generalization to cloth-consistent datasets like Market1501 and MSMT17. Overall, SCI demonstrates the value of cross-modal semantic integration and prompt learning for robust CC-ReID, offering a practical approach with strong empirical performance and avenues for improved prompt interpretability.

Abstract

Cloth-changing person re-identification (CC-ReID) aims to match individuals across surveillance cameras despite variations in clothing. Existing methods typically mitigate the impact of clothing changes or enhance identity (ID)-relevant features, but they often struggle to capture complex semantic information. In this paper, we propose a novel prompt learning framework Semantic Contextual Integration (SCI), which leverages the visual-textual representation capabilities of CLIP to reduce clothing-induced discrepancies and strengthen ID cues. Specifically, we introduce the Semantic Separation Enhancement (SSE) module, which employs dual learnable text tokens to disentangle clothing-related semantics from confounding factors, thereby isolating ID-relevant features. Furthermore, we develop a Semantic-Guided Interaction Module (SIM) that uses orthogonalized text features to guide visual representations, sharpening the focus of the model on distinctive ID characteristics. This semantic integration improves the discriminative power of the model and enriches the visual context with high-dimensional insights. Extensive experiments on three CC-ReID datasets demonstrate that our method outperforms state-of-the-art techniques. The code will be released at https://github.com/hxy-499/CCREID-SCI.

Paper Structure

This paper contains 31 sections, 20 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: Comparison of traditional methods and our SCI approach. (a) Traditional methods rely on parsing, gait analysis, skeleton extraction, and data augmentation to suppress clothing effects, incurring significant preprocessing overhead. (b) Our SCI approach directly removes clothing bias within the model and exploits inherent ID-related features from images.
  • Figure 2: Framework of the proposed SCI, comprising two key components: the Semantic Separation Enhancement (SSE) module and the Semantic-Guided Interaction Module (SIM). SSE mitigates clothing bias by removing negative semantic factors, while SIM employs the refined text features to guide visual representations, strengthening cross-modal interaction.
  • Figure 3: Illustration of the SIM process. Textual information refines visual feature extraction to align features with the relevant semantic context.
  • Figure 4: t-SNE visualization of 20 randomly selected classes from LTCC. Colors indicate ground-truth IDs. (a)--(b) depict successive stages of the baseline, while (c)--(d) show the corresponding stages of our method.
  • Figure 5: Visualization of feature maps on LTCC (first row), VC-Clothes (second row), and PRCC (third row). The first column shows the original images, while the second and third columns present the feature maps from the baseline and our method, respectively.
  • ...and 6 more figures