Table of Contents
Fetching ...

TSDW: A Tri-Stream Dynamic Weight Network for Cloth-Changing Person Re-Identification

Ruiqi He, Zihan Wang, Xiang Zhou

TL;DR

This work tackles cloth-changing person re-identification (CC-ReID) by introducing a Tri-Stream Dynamic Weight Network (TSDW) that fuses facial, head-limb, and global features. It employs Semantic Human Parsing (SCHP) to generate region-specific inputs and processes them through three tailored streams, guided by a hierarchical, confidence-based three-way decision mechanism. A Clothes-based Adversarial Loss (CAL) balances identity and clothing cues in the global stream, enabling robust representations when clothing varies or features are occluded. Across Celeb-reID, PRCC, and VC-Clothes, TSDW achieves state-of-the-art performance, demonstrating enhanced accuracy and robustness in CC-ReID under diverse data quality and camera conditions.

Abstract

Cloth-Changing Person Re-identification (CC-ReID) aims to solve the challenge of identifying individuals across different temporal-spatial scenarios, viewpoints, and clothing variations. This field is gaining increasing attention in big data research and public security domains. Existing ReID research primarily relies on face recognition, gait semantic recognition, and clothing-irrelevant feature identification, which perform relatively well in scenarios with high-quality clothing change videos and images. However, these approaches depend on either single features or simple combinations of multiple features, making further performance improvements difficult. Additionally, limitations such as missing facial information, challenges in gait extraction, and inconsistent camera parameters restrict the broader application of CC-ReID. To address the above limitations, we innovatively propose a Tri-Stream Dynamic Weight Network (TSDW) that requires only images. This dynamic weighting network consists of three parallel feature streams: facial features, head-limb features, and global features. Each stream specializes in extracting its designated features, after which a gating network dynamically fuses confidence levels. The three parallel feature streams enhance recognition performance and reduce the impact of any single feature failure, thereby improving model robustness. Extensive experiments on benchmark datasets (e.g., PRCC, Celeb-reID, VC-Clothes) demonstrate that our method significantly outperforms existing state-of-the-art approaches.

TSDW: A Tri-Stream Dynamic Weight Network for Cloth-Changing Person Re-Identification

TL;DR

This work tackles cloth-changing person re-identification (CC-ReID) by introducing a Tri-Stream Dynamic Weight Network (TSDW) that fuses facial, head-limb, and global features. It employs Semantic Human Parsing (SCHP) to generate region-specific inputs and processes them through three tailored streams, guided by a hierarchical, confidence-based three-way decision mechanism. A Clothes-based Adversarial Loss (CAL) balances identity and clothing cues in the global stream, enabling robust representations when clothing varies or features are occluded. Across Celeb-reID, PRCC, and VC-Clothes, TSDW achieves state-of-the-art performance, demonstrating enhanced accuracy and robustness in CC-ReID under diverse data quality and camera conditions.

Abstract

Cloth-Changing Person Re-identification (CC-ReID) aims to solve the challenge of identifying individuals across different temporal-spatial scenarios, viewpoints, and clothing variations. This field is gaining increasing attention in big data research and public security domains. Existing ReID research primarily relies on face recognition, gait semantic recognition, and clothing-irrelevant feature identification, which perform relatively well in scenarios with high-quality clothing change videos and images. However, these approaches depend on either single features or simple combinations of multiple features, making further performance improvements difficult. Additionally, limitations such as missing facial information, challenges in gait extraction, and inconsistent camera parameters restrict the broader application of CC-ReID. To address the above limitations, we innovatively propose a Tri-Stream Dynamic Weight Network (TSDW) that requires only images. This dynamic weighting network consists of three parallel feature streams: facial features, head-limb features, and global features. Each stream specializes in extracting its designated features, after which a gating network dynamically fuses confidence levels. The three parallel feature streams enhance recognition performance and reduce the impact of any single feature failure, thereby improving model robustness. Extensive experiments on benchmark datasets (e.g., PRCC, Celeb-reID, VC-Clothes) demonstrate that our method significantly outperforms existing state-of-the-art approaches.

Paper Structure

This paper contains 18 sections, 14 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison of the face, head-limb, and global image. From these three different perspectives, we can simultaneously compare and identify whether they belong to the same person.However, it's important to note that facial images are not always available across all perspectives. For example, in the first comparison, we can proceed with all three angles, while in the second comparison, we are limited to only two angles.
  • Figure 2: The proposed TSDW framework consists of an SCHP preprocessing module, three complementary feature extraction streams, and a dynamic weighted three-way decision module. The three parallel streams adopt different strategies to extract feature representations. The dynamic weighted three-way decision module adaptively assigns weights to each feature stream based on the input query and gallery features, ultimately generating a q×g similarity matrix, where q and g represent the number of images in the query and gallery sets, respectively. This architecture effectively enhances person re-identification matching performance in complex scenarios through its dynamic weighting mechanism.
  • Figure 3: Attention Heat Map of Facial Stream
  • Figure 4: Attention Heat Map of Head-Limb Stream
  • Figure 5: Attention Heat Map of Global Stream