Keypoint Promptable Re-Identification
Vladimir Somers, Christophe De Vleeschouwer, Alexandre Alahi
TL;DR
This work introduces Keypoint Promptable ReID (KPR), a prompt-driven approach to address Multi-Person Ambiguity in occluded person re-identification by conditioning appearance encoding on semantic keypoints. The model combines a tokenization scheme for image and keypoints, a Multi-Stage Feature Fusion Swin backbone, and a Part-based Head to produce body-part embeddings with visibility scores, trained with a GiLt-based ReID loss and a token-level part-prediction loss. A novel Batch-wise Inter-Person Occlusion (BIPO) augmentation and a new Occ-PTrack dataset with keypoint annotations enable robust learning under multi-person occlusions and allow explicit target identification within bounding boxes. Empirically, KPR achieves state-of-the-art performance on Occluded-Duke and Occ-PTrack, and demonstrates strong gains in pose tracking, with the prompts providing consistent benefits even when partially missing. The work also demonstrates the prompt-optional nature of KPR and releases code, annotations, and Occ-PTrack to encourage broader exploration of promptable ReID paradigms.
Abstract
Occluded Person Re-Identification (ReID) is a metric learning task that involves matching occluded individuals based on their appearance. While many studies have tackled occlusions caused by objects, multi-person occlusions remain less explored. In this work, we identify and address a critical challenge overlooked by previous occluded ReID methods: the Multi-Person Ambiguity (MPA) arising when multiple individuals are visible in the same bounding box, making it impossible to determine the intended ReID target among the candidates. Inspired by recent work on prompting in vision, we introduce Keypoint Promptable ReID (KPR), a novel formulation of the ReID problem that explicitly complements the input bounding box with a set of semantic keypoints indicating the intended target. Since promptable re-identification is an unexplored paradigm, existing ReID datasets lack the pixel-level annotations necessary for prompting. To bridge this gap and foster further research on this topic, we introduce Occluded-PoseTrack ReID, a novel ReID dataset with keypoints labels, that features strong inter-person occlusions. Furthermore, we release custom keypoint labels for four popular ReID benchmarks. Experiments on person retrieval, but also on pose tracking, demonstrate that our method systematically surpasses previous state-of-the-art approaches on various occluded scenarios. Our code, dataset and annotations are available at https://github.com/VlSomers/keypoint_promptable_reidentification.
