See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification
Xiyu Han, Xian Zhong, Wenxin Huang, Xuemei Jia, Xiaohan Yu, Alex Chichung Kot
TL;DR
This work tackles cloth-changing person re-identification (CC-ReID) by introducing Semantic Contextual Integration (SCI), a prompt-learning framework built on CLIP. SCI comprises two key components: Semantic Separation Enhancement (SSE), which learns dual prompts to disentangle ID-relevant semantics from clothing factors through orthogonalization, and Semantic-Guided Interaction Module (SIM), which uses refined text features to guide visual representations via a non-local, text-informed attention mechanism. The method optimizes a joint objective that aligns image-text features while suppressing clothing-related cues, yielding state-of-the-art results on LTCC, PRCC, and VC-Clothes, with good generalization to cloth-consistent datasets like Market1501 and MSMT17. Overall, SCI demonstrates the value of cross-modal semantic integration and prompt learning for robust CC-ReID, offering a practical approach with strong empirical performance and avenues for improved prompt interpretability.
Abstract
Cloth-changing person re-identification (CC-ReID) aims to match individuals across surveillance cameras despite variations in clothing. Existing methods typically mitigate the impact of clothing changes or enhance identity (ID)-relevant features, but they often struggle to capture complex semantic information. In this paper, we propose a novel prompt learning framework Semantic Contextual Integration (SCI), which leverages the visual-textual representation capabilities of CLIP to reduce clothing-induced discrepancies and strengthen ID cues. Specifically, we introduce the Semantic Separation Enhancement (SSE) module, which employs dual learnable text tokens to disentangle clothing-related semantics from confounding factors, thereby isolating ID-relevant features. Furthermore, we develop a Semantic-Guided Interaction Module (SIM) that uses orthogonalized text features to guide visual representations, sharpening the focus of the model on distinctive ID characteristics. This semantic integration improves the discriminative power of the model and enriches the visual context with high-dimensional insights. Extensive experiments on three CC-ReID datasets demonstrate that our method outperforms state-of-the-art techniques. The code will be released at https://github.com/hxy-499/CCREID-SCI.
