Table of Contents
Fetching ...

ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification

Can Cui, Siteng Huang, Wenxuan Song, Pengxiang Ding, Min Zhang, Donglin Wang

TL;DR

A Prompt-guided Feature Disentangling method (ProFD), which leverages the rich pre-trained knowledge in the textual modality facilitate model to generate well-aligned part features and adopts a hybrid-attention decoder, ensuring spatial and semantic consistency during the decoding process to minimize noise impact.

Abstract

To address the occlusion issues in person Re-Identification (ReID) tasks, many methods have been proposed to extract part features by introducing external spatial information. However, due to missing part appearance information caused by occlusion and noisy spatial information from external model, these purely vision-based approaches fail to correctly learn the features of human body parts from limited training data and struggle in accurately locating body parts, ultimately leading to misaligned part features. To tackle these challenges, we propose a Prompt-guided Feature Disentangling method (ProFD), which leverages the rich pre-trained knowledge in the textual modality facilitate model to generate well-aligned part features. ProFD first designs part-specific prompts and utilizes noisy segmentation mask to preliminarily align visual and textual embedding, enabling the textual prompts to have spatial awareness. Furthermore, to alleviate the noise from external masks, ProFD adopts a hybrid-attention decoder, ensuring spatial and semantic consistency during the decoding process to minimize noise impact. Additionally, to avoid catastrophic forgetting, we employ a self-distillation strategy, retaining pre-trained knowledge of CLIP to mitigate over-fitting. Evaluation results on the Market1501, DukeMTMC-ReID, Occluded-Duke, Occluded-ReID, and P-DukeMTMC datasets demonstrate that ProFD achieves state-of-the-art results. Our project is available at: https://github.com/Cuixxx/ProFD.

ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification

TL;DR

A Prompt-guided Feature Disentangling method (ProFD), which leverages the rich pre-trained knowledge in the textual modality facilitate model to generate well-aligned part features and adopts a hybrid-attention decoder, ensuring spatial and semantic consistency during the decoding process to minimize noise impact.

Abstract

To address the occlusion issues in person Re-Identification (ReID) tasks, many methods have been proposed to extract part features by introducing external spatial information. However, due to missing part appearance information caused by occlusion and noisy spatial information from external model, these purely vision-based approaches fail to correctly learn the features of human body parts from limited training data and struggle in accurately locating body parts, ultimately leading to misaligned part features. To tackle these challenges, we propose a Prompt-guided Feature Disentangling method (ProFD), which leverages the rich pre-trained knowledge in the textual modality facilitate model to generate well-aligned part features. ProFD first designs part-specific prompts and utilizes noisy segmentation mask to preliminarily align visual and textual embedding, enabling the textual prompts to have spatial awareness. Furthermore, to alleviate the noise from external masks, ProFD adopts a hybrid-attention decoder, ensuring spatial and semantic consistency during the decoding process to minimize noise impact. Additionally, to avoid catastrophic forgetting, we employ a self-distillation strategy, retaining pre-trained knowledge of CLIP to mitigate over-fitting. Evaluation results on the Market1501, DukeMTMC-ReID, Occluded-Duke, Occluded-ReID, and P-DukeMTMC datasets demonstrate that ProFD achieves state-of-the-art results. Our project is available at: https://github.com/Cuixxx/ProFD.
Paper Structure (31 sections, 19 equations, 5 figures, 5 tables)

This paper contains 31 sections, 19 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Two crucial challenges of occluded person ReID. (a) Missing Information caused by occlusion. (b) Noise in external spatial information. (c) Our proposed Prompt-guided Feature Disentangling method (ProFD).
  • Figure 2: Illustration of our proposed ProFD framework. It mainly contains three components: (1) Part-aware Knowledge Adaptation(left), (2) Prompt-guided Feature Disentangling(middle), (3) General Knowledge Preservation & Fine-tuning(right). Part-aware Knowledge Adaptation aims to adapt CLIP to Occluded Person ReID task. Prompt-guided Feature Disentangling employ hybrid-attention decoder to extract corresponding part features from holistic feature map based on textual prompt. For a more detailed structure of hybrid-attention, please refer to Figure \ref{['fig:attn_frame']}. General Knowledge Preservation utilize global and part memory banks to avoid pre-trained knowledge forgetting of CLIP during fine-tuning.
  • Figure 3: The architecture of hybrid attention decoder.
  • Figure 4: Evaluation of the perfomance with different momentum $m_g$ and $m_p$ on Occluded-Duke.
  • Figure 5: Visualization of spatial-aware attention. (a) Unoccluded case. (b) Occluded case. Our method accurately focuses on the sepcified body regions following textual prompts in both cases.