PGDS: Pose-Guidance Deep Supervision for Mitigating Clothes-Changing in Person Re-Identification

Quoc-Huy Trinh; Nhat-Tan Bui; Dinh-Hieu Hoang; Phuoc-Thao Vo Thi; Hai-Dang Nguyen; Debesh Jha; Ulas Bagci; Ngan Le; Minh-Triet Tran

PGDS: Pose-Guidance Deep Supervision for Mitigating Clothes-Changing in Person Re-Identification

Quoc-Huy Trinh, Nhat-Tan Bui, Dinh-Hieu Hoang, Phuoc-Thao Vo Thi, Hai-Dang Nguyen, Debesh Jha, Ulas Bagci, Ngan Le, Minh-Triet Tran

TL;DR

Pose-Guidance Deep Supervision (PGDS) tackles clothes-changing in person Re-Identification by training a Re-ID backbone under the guidance of a frozen pose encoder through a Pose-to-Human Projection (PHP) module. The approach uses multi-scale projectors to transfer pose knowledge to a SOLIDER-based human encoder, optimized with a triplet loss and a KL-divergence-based guide loss, controlled by $\lambda=0.8$. Empirical results across clothes-changing and clothes-consistent datasets show state-of-the-art gains in clothes-changing scenarios and competitive performance elsewhere, with robust cross-domain transfer. The method preserves inference efficiency since the pose encoder remains frozen during inference, making PGDS practical for real-world surveillance applications. This work provides a solid foundation for further exploring pose-informed supervision in Re-ID and related biometric tasks.

Abstract

Person Re-Identification (Re-ID) task seeks to enhance the tracking of multiple individuals by surveillance cameras. It supports multimodal tasks, including text-based person retrieval and human matching. One of the most significant challenges faced in Re-ID is clothes-changing, where the same person may appear in different outfits. While previous methods have made notable progress in maintaining clothing data consistency and handling clothing change data, they still rely excessively on clothing information, which can limit performance due to the dynamic nature of human appearances. To mitigate this challenge, we propose the Pose-Guidance Deep Supervision (PGDS), an effective framework for learning pose guidance within the Re-ID task. It consists of three modules: a human encoder, a pose encoder, and a Pose-to-Human Projection module (PHP). Our framework guides the human encoder, i.e., the main re-identification model, with pose information from the pose encoder through multiple layers via the knowledge transfer mechanism from the PHP module, helping the human encoder learn body parts information without increasing computation resources in the inference stage. Through extensive experiments, our method surpasses the performance of current state-of-the-art methods, demonstrating its robustness and effectiveness for real-world applications. Our code is available at https://github.com/huyquoctrinh/PGDS.

PGDS: Pose-Guidance Deep Supervision for Mitigating Clothes-Changing in Person Re-Identification

TL;DR

. Empirical results across clothes-changing and clothes-consistent datasets show state-of-the-art gains in clothes-changing scenarios and competitive performance elsewhere, with robust cross-domain transfer. The method preserves inference efficiency since the pose encoder remains frozen during inference, making PGDS practical for real-world surveillance applications. This work provides a solid foundation for further exploring pose-informed supervision in Re-ID and related biometric tasks.

Abstract

Paper Structure (13 sections, 5 equations, 3 figures, 6 tables)

This paper contains 13 sections, 5 equations, 3 figures, 6 tables.

Introduction
Related Work
Proposed PGDS
Pose Encoder
Human Encoder
Pose-to-Human Projection Module (PHP)
Objective Functions
Experiments
Experimental Setup
Performance Comparisons
Ablation Study
Feature Map Visualization
Conclusion

Figures (3)

Figure 1: An example query retrieved by our framework under the clothes-changing scenario.
Figure 2: Overall framework of proposed PGDS including three modules: a human encoder, a pose encoder, and a pose-to-human projection module (PHP). The pose encoder module utilizes a frozen pre-trained model while we fine-tune a pre-trained human-centric model for the human encoder module. Our PHP transfers pose knowledge from the pose encoder module to the human encoder module through multiple projectors and guide loss $\mathcal{L}_{guide}$. $H, W,$ and $C$ denote the height, width, and channel, respectively. $\mathcal{L}_{triplet}$ is the triplet loss hoffer2015deep which acquires person-centric representations.
Figure 3: Heatmap visualization compared with baseline to understand the behavior of our framework. The baseline is the SOLIDER chen2023beyond.

PGDS: Pose-Guidance Deep Supervision for Mitigating Clothes-Changing in Person Re-Identification

TL;DR

Abstract

PGDS: Pose-Guidance Deep Supervision for Mitigating Clothes-Changing in Person Re-Identification

Authors

TL;DR

Abstract

Table of Contents

Figures (3)