PAFormer: Part Aware Transformer for Person Re-identification
Hyeono Jung, Jangwon Lee, Jiwon Yoo, Dami Ko, Gyeonghwan Kim
TL;DR
PAFormer tackles partial ReID by introducing pose tokens that explicitly associate patch tokens with body parts, enabling precise part-to-part comparisons. It uses a learning-based visibility predictor and a teacher-forcing mechanism based on ground-truth visibility to handle occlusion, while inference does not require extra pose-localization modules. The method optimizes a joint loss including CLS ReID, partial ReID, pose supervision, and visibility, and computes sample distances as $d^{i,j} = d_{CLS}^{i,j} + { \sum_p d_p^{i,j} v_p^{i} v_p^{j} \over \sum_p v_p^{i} v_p^{j} }$. Experiments on Market-1501, DukeMTMC-ReID, and Occluded-Duke show state-of-the-art or competitive performance, highlighting improved robustness to occlusion and better part-level alignment. PAFormer advances ReID by integrating anatomical awareness into a transformer framework with no extra inference-time localization modules.
Abstract
Within the domain of person re-identification (ReID), partial ReID methods are considered mainstream, aiming to measure feature distances through comparisons of body parts between samples. However, in practice, previous methods often lack sufficient awareness of anatomical aspect of body parts, resulting in the failure to capture features of the same body parts across different samples. To address this issue, we introduce \textbf{Part Aware Transformer (PAFormer)}, a pose estimation based ReID model which can perform precise part-to-part comparison. In order to inject part awareness to pose tokens, we introduce learnable parameters called `pose token' which estimate the correlation between each body part and partial regions of the image. Notably, at inference phase, PAFormer operates without additional modules related to body part localization, which is commonly used in previous ReID methodologies leveraging pose estimation models. Additionally, leveraging the enhanced awareness of body parts, PAFormer suggests the use of a learning-based visibility predictor to estimate the degree of occlusion for each body part. Also, we introduce a teacher forcing technique using ground truth visibility scores which enables PAFormer to be trained only with visible parts. A set of extensive experiments show that our method outperforms existing approaches on well-known ReID benchmark datasets.
