DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification

Shuguang Dou; Xiangyang Jiang; Yuanpeng Tu; Junyao Gao; Zefan Qu; Qingsong Zhao; Cairong Zhao

DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification

Shuguang Dou, Xiangyang Jiang, Yuanpeng Tu, Junyao Gao, Zefan Qu, Qingsong Zhao, Cairong Zhao

TL;DR

DROP tackles occluded ReID by decoupling ReID and human parsing into task-specific feature streams, addressing the conflicting granularity needs of instance-level ReID and semantic parsing. It introduces Detail-Preserving Upsampling to fuse multi-scale backbone features for parsing, and a Pedestrian Position Encoder to inject height-based spatial cues, while the ReID branch leverages a Parts Embedding Memory Bank and a Part-aware Compactness Triplet loss to strengthen part-level discrimination. The Parsing Guided ReID Branch uses Weighted Average and Max Pooling to integrate parsing signals into ReID representations, with a memory-based training regime and spatially smoothed parsing loss to stabilize learning. Empirically, DROP achieves state-of-the-art Rank-1 and mAP on Occluded-Duke and competitive results on holistic datasets, illustrating the effectiveness of task-specific decoupling for occluded person ReID.

Abstract

The paper introduces the Decouple Re-identificatiOn and human Parsing (DROP) method for occluded person re-identification (ReID). Unlike mainstream approaches using global features for simultaneous multi-task learning of ReID and human parsing, or relying on semantic information for attention guidance, DROP argues that the inferior performance of the former is due to distinct granularity requirements for ReID and human parsing features. ReID focuses on instance part-level differences between pedestrian parts, while human parsing centers on semantic spatial context, reflecting the internal structure of the human body. To address this, DROP decouples features for ReID and human parsing, proposing detail-preserving upsampling to combine varying resolution feature maps. Parsing-specific features for human parsing are decoupled, and human position information is exclusively added to the human parsing branch. In the ReID branch, a part-aware compactness loss is introduced to enhance instance-level part differences. Experimental results highlight the efficacy of DROP, especially achieving a Rank-1 accuracy of 76.8% on Occluded-Duke, surpassing two mainstream methods. The codebase is accessible at https://github.com/shuguang-52/DROP.

DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification

TL;DR

Abstract

Paper Structure (39 sections, 8 equations, 6 figures, 9 tables)

This paper contains 39 sections, 8 equations, 6 figures, 9 tables.

Introduction
Related work
Occluded Person Re-identification.
Decoupled Heads for Multi-Task Learning.
Method
Overview
Motivation.
Overall framework.
Human Position-aware Parsing Branch
Detail-preserving upsampling.
Pedestrian position-aware feature.
Parsing Guided ReID Branch
Training.
Inference.
Optimization
...and 24 more sections

Figures (6)

Figure 1: Comparison of three methods for occluded person ReID. (a) A multi-task learning framework to simultaneously ReID and segmentation tasks based on the same features. (b) Dual Supervised attention mechanism module learning with ID labels and extra coarse human parsing labels. (c) Ours DROP.
Figure 2: Structure of DROP with decoupled branches. The model consists of a human position-aware parsing branch for human parsing and a parsing guided ReID branch for producing the global, foreground, and parts embeddings. WAMP denotes the global weighted average and max pooling. $\mathcal{L}_{pct}$ denotes the part-aware compactness triplet loss.
Figure 3: Qualitative results of DROP. The blue box represents the query, the green box in the first column indicates a successful retrieval, and the red box indicates a failed retrieval. The green boxes in the next 9 columns indicate that the human part is partitioned, while the red boxes indicate that there is no corresponding human part.
Figure 4: The structure of Weight Average and Max pooling.
Figure 5: The accuracy in the training processing.Left: the accuracy of human parsing. Right: the accuracy of foreground embedding.
...and 1 more figures

DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification

TL;DR

Abstract

DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)