Table of Contents
Fetching ...

Keypoint Promptable Re-Identification

Vladimir Somers, Christophe De Vleeschouwer, Alexandre Alahi

TL;DR

This work introduces Keypoint Promptable ReID (KPR), a prompt-driven approach to address Multi-Person Ambiguity in occluded person re-identification by conditioning appearance encoding on semantic keypoints. The model combines a tokenization scheme for image and keypoints, a Multi-Stage Feature Fusion Swin backbone, and a Part-based Head to produce body-part embeddings with visibility scores, trained with a GiLt-based ReID loss and a token-level part-prediction loss. A novel Batch-wise Inter-Person Occlusion (BIPO) augmentation and a new Occ-PTrack dataset with keypoint annotations enable robust learning under multi-person occlusions and allow explicit target identification within bounding boxes. Empirically, KPR achieves state-of-the-art performance on Occluded-Duke and Occ-PTrack, and demonstrates strong gains in pose tracking, with the prompts providing consistent benefits even when partially missing. The work also demonstrates the prompt-optional nature of KPR and releases code, annotations, and Occ-PTrack to encourage broader exploration of promptable ReID paradigms.

Abstract

Occluded Person Re-Identification (ReID) is a metric learning task that involves matching occluded individuals based on their appearance. While many studies have tackled occlusions caused by objects, multi-person occlusions remain less explored. In this work, we identify and address a critical challenge overlooked by previous occluded ReID methods: the Multi-Person Ambiguity (MPA) arising when multiple individuals are visible in the same bounding box, making it impossible to determine the intended ReID target among the candidates. Inspired by recent work on prompting in vision, we introduce Keypoint Promptable ReID (KPR), a novel formulation of the ReID problem that explicitly complements the input bounding box with a set of semantic keypoints indicating the intended target. Since promptable re-identification is an unexplored paradigm, existing ReID datasets lack the pixel-level annotations necessary for prompting. To bridge this gap and foster further research on this topic, we introduce Occluded-PoseTrack ReID, a novel ReID dataset with keypoints labels, that features strong inter-person occlusions. Furthermore, we release custom keypoint labels for four popular ReID benchmarks. Experiments on person retrieval, but also on pose tracking, demonstrate that our method systematically surpasses previous state-of-the-art approaches on various occluded scenarios. Our code, dataset and annotations are available at https://github.com/VlSomers/keypoint_promptable_reidentification.

Keypoint Promptable Re-Identification

TL;DR

This work introduces Keypoint Promptable ReID (KPR), a prompt-driven approach to address Multi-Person Ambiguity in occluded person re-identification by conditioning appearance encoding on semantic keypoints. The model combines a tokenization scheme for image and keypoints, a Multi-Stage Feature Fusion Swin backbone, and a Part-based Head to produce body-part embeddings with visibility scores, trained with a GiLt-based ReID loss and a token-level part-prediction loss. A novel Batch-wise Inter-Person Occlusion (BIPO) augmentation and a new Occ-PTrack dataset with keypoint annotations enable robust learning under multi-person occlusions and allow explicit target identification within bounding boxes. Empirically, KPR achieves state-of-the-art performance on Occluded-Duke and Occ-PTrack, and demonstrates strong gains in pose tracking, with the prompts providing consistent benefits even when partially missing. The work also demonstrates the prompt-optional nature of KPR and releases code, annotations, and Occ-PTrack to encourage broader exploration of promptable ReID paradigms.

Abstract

Occluded Person Re-Identification (ReID) is a metric learning task that involves matching occluded individuals based on their appearance. While many studies have tackled occlusions caused by objects, multi-person occlusions remain less explored. In this work, we identify and address a critical challenge overlooked by previous occluded ReID methods: the Multi-Person Ambiguity (MPA) arising when multiple individuals are visible in the same bounding box, making it impossible to determine the intended ReID target among the candidates. Inspired by recent work on prompting in vision, we introduce Keypoint Promptable ReID (KPR), a novel formulation of the ReID problem that explicitly complements the input bounding box with a set of semantic keypoints indicating the intended target. Since promptable re-identification is an unexplored paradigm, existing ReID datasets lack the pixel-level annotations necessary for prompting. To bridge this gap and foster further research on this topic, we introduce Occluded-PoseTrack ReID, a novel ReID dataset with keypoints labels, that features strong inter-person occlusions. Furthermore, we release custom keypoint labels for four popular ReID benchmarks. Experiments on person retrieval, but also on pose tracking, demonstrate that our method systematically surpasses previous state-of-the-art approaches on various occluded scenarios. Our code, dataset and annotations are available at https://github.com/VlSomers/keypoint_promptable_reidentification.
Paper Structure (21 sections, 3 equations, 9 figures, 7 tables)

This paper contains 21 sections, 3 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Overview of our proposed Keypoint Promptable ReID (KPR) method. KPR takes an image with keypoints prompts as input and produces part-based features of the prompted target. The prompt instructs the model to focus on a specific individual, i.e. the back blue jacket man (b) or the front black t-shirt man (c) in this example. Colored dots illustrate the positive keypoints prompt, with one color per body part.
  • Figure 2: Person retrieval with Multi-Person Ambiguity (MPA). Green/red borders are correct/incorrect matches. Red/pastel dots indicate negative/positive prompts.
  • Figure 2: Multi-person pose tracking performance in videos on PoseTrack21 Doering2022.
  • Figure 3: Architecture overview of our proposed Keypoint Promptable ReID (KPR) model. The Batch-wise Inter-Person Occlusion (BIPO) data augmentation is first applied to generate artificial inter-person occlusions (\ref{['section:bipo']}). The image and the optional positive/negative ($\oplus / \ominus$) prompts are then tokenized and summed (\ref{['section:tokenization']}). Tokens are then fed to our proposed Multi-Stage feature Fusion (MSF) Swin transformer to generate high-resolution feature maps (\ref{['section:encoding']}). The feature map is then fed to a Part-based Head (PBH), which assigns a part (or the background) to each token with a part classifier, and then averages all tokens of the same part to produce the final $K$ part-based embeddings {$f_1$, ..., $f_K$} and their binary visibility scores {$v_1$, ..., $v_K$}, with $v_i \in \{0, 1\}$, and visually denoted by {✗, ✓} (\ref{['section:pbh']}). A part $i$ with no token assigned is considered invisible (i.e., $v_i = 0$), and ignored when computing two samples' similarity. KPR is illustrated here for K=8, with a unique color for each body part: {head, torso, right/left arm, right/left leg, and right/left feet}. Finally, the entire pipeline is trained with two losses: a Part-Prediction (PP) Loss and a ReID Loss (\ref{['section:training']}).
  • Figure 4: (a) Our proposed Batch-wise Inter-Person Occlusion (BIPO) data augmentation creates artificial person occlusions that are consistent across image, prompt, and human parsing labels. BIPO is crucial to enforce the model to rely on the input prompts. (b) Four identities with their keypoints labels from our Occ-PTrack dataset.
  • ...and 4 more figures