TSDW: A Tri-Stream Dynamic Weight Network for Cloth-Changing Person Re-Identification
Ruiqi He, Zihan Wang, Xiang Zhou
TL;DR
This work tackles cloth-changing person re-identification (CC-ReID) by introducing a Tri-Stream Dynamic Weight Network (TSDW) that fuses facial, head-limb, and global features. It employs Semantic Human Parsing (SCHP) to generate region-specific inputs and processes them through three tailored streams, guided by a hierarchical, confidence-based three-way decision mechanism. A Clothes-based Adversarial Loss (CAL) balances identity and clothing cues in the global stream, enabling robust representations when clothing varies or features are occluded. Across Celeb-reID, PRCC, and VC-Clothes, TSDW achieves state-of-the-art performance, demonstrating enhanced accuracy and robustness in CC-ReID under diverse data quality and camera conditions.
Abstract
Cloth-Changing Person Re-identification (CC-ReID) aims to solve the challenge of identifying individuals across different temporal-spatial scenarios, viewpoints, and clothing variations. This field is gaining increasing attention in big data research and public security domains. Existing ReID research primarily relies on face recognition, gait semantic recognition, and clothing-irrelevant feature identification, which perform relatively well in scenarios with high-quality clothing change videos and images. However, these approaches depend on either single features or simple combinations of multiple features, making further performance improvements difficult. Additionally, limitations such as missing facial information, challenges in gait extraction, and inconsistent camera parameters restrict the broader application of CC-ReID. To address the above limitations, we innovatively propose a Tri-Stream Dynamic Weight Network (TSDW) that requires only images. This dynamic weighting network consists of three parallel feature streams: facial features, head-limb features, and global features. Each stream specializes in extracting its designated features, after which a gating network dynamically fuses confidence levels. The three parallel feature streams enhance recognition performance and reduce the impact of any single feature failure, thereby improving model robustness. Extensive experiments on benchmark datasets (e.g., PRCC, Celeb-reID, VC-Clothes) demonstrate that our method significantly outperforms existing state-of-the-art approaches.
