Table of Contents
Fetching ...

DDRN:a Data Distribution Reconstruction Network for Occluded Person Re-Identification

Zhaoyong Wang, Yujie Liu, Mingyue Li, Wenxin Zhang, Zongmin Li

TL;DR

The Data Distribution Reconstruction Network (DDRN) is proposed, a generative model that leverages data distribution to filter out irrelevant details, enhancing overall feature perception ability and reducing irrelevant feature interference in occlusion cases.

Abstract

In occluded person re-identification(ReID), severe occlusions lead to a significant amount of irrelevant information that hinders the accurate identification of individuals. These irrelevant cues primarily stem from background interference and occluding interference, adversely affecting the final retrieval results. Traditional discriminative models, which rely on the specific content and positions of the images, often misclassify in cases of occlusion. To address these limitations, we propose the Data Distribution Reconstruction Network (DDRN), a generative model that leverages data distribution to filter out irrelevant details, enhancing overall feature perception ability and reducing irrelevant feature interference. Additionally, severe occlusions lead to the complexity of the feature space. To effectively handle this, we design a multi-center approach through the proposed Hierarchical SubcenterArcface (HS-Arcface) loss function, which can better approximate complex feature spaces. On the Occluded-Duke dataset, we achieved a mAP of 62.4\% (+1.1\%) and a rank-1 accuracy of 71.3\% (+0.6\%), surpassing the latest state-of-the-art methods(FRT) significantly.

DDRN:a Data Distribution Reconstruction Network for Occluded Person Re-Identification

TL;DR

The Data Distribution Reconstruction Network (DDRN) is proposed, a generative model that leverages data distribution to filter out irrelevant details, enhancing overall feature perception ability and reducing irrelevant feature interference in occlusion cases.

Abstract

In occluded person re-identification(ReID), severe occlusions lead to a significant amount of irrelevant information that hinders the accurate identification of individuals. These irrelevant cues primarily stem from background interference and occluding interference, adversely affecting the final retrieval results. Traditional discriminative models, which rely on the specific content and positions of the images, often misclassify in cases of occlusion. To address these limitations, we propose the Data Distribution Reconstruction Network (DDRN), a generative model that leverages data distribution to filter out irrelevant details, enhancing overall feature perception ability and reducing irrelevant feature interference. Additionally, severe occlusions lead to the complexity of the feature space. To effectively handle this, we design a multi-center approach through the proposed Hierarchical SubcenterArcface (HS-Arcface) loss function, which can better approximate complex feature spaces. On the Occluded-Duke dataset, we achieved a mAP of 62.4\% (+1.1\%) and a rank-1 accuracy of 71.3\% (+0.6\%), surpassing the latest state-of-the-art methods(FRT) significantly.

Paper Structure

This paper contains 18 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Retrieval results (top) for both the Vit Baseline and our approach, along with the overall structure (bottom) of our proposed method. The upper part of the figure illustrates the impact of occlusion and background interference on retrieval results. The green boxes represent correct retrieval results, while the red boxes represent incorrect retrievals. The leftmost image represents the query image, the middle column shows the retrieval results of the Vit Baseline, and the rightmost column displays the retrieval results of our proposed method. The lower part presents an overview of our proposed solution, which achieves feature reconstruction by predicting the distribution of the intermediate process.
  • Figure 2: The network architecture of DDRN. It using the standard transformer block as feature extractor. The features between the $N-1_{th}$ and $N_{th}$ layers are replaced by the most similar vectors in the Embedding Space. Orthogonal Loss is employed to ensure that the vectors in the Embedding Space can represent different types of features separately. The final global CLS token utilizes our proposed HS-Arcface loss. the CLS token, which fuses local features, adapts ID loss.
  • Figure 3: The vectors in the Embedding Space have cosine distances between each other, except for themselves. To demonstrate the proposed Orthogonal Loss, we statistically analyze their cosine distance distribution. The x-axis represents the cosine distance, and the y-axis represents the corresponding probabilities. The curve in the plot is obtained using kernel density estimation with the Gaussian kernel function.
  • Figure 4: Comparison among ArcFace, Subcenter ArcFace, and HS-ArcFace. We use the disk to represent the entire feature space, where ArcFace(a) performs classification in the feature space. SubCenterArcFace(b) introduces multiple sub-center representations on top of ArcFace. HS-Arcface(c) addresses the reduced inter-class distances caused by introducing k sub-center representations in SubCenterArcFace.
  • Figure 5: The network architecture of HS-Arcface. In the initial stage of HS-Arcface, we set $i$ to 1. We select the $i_{th}$ sub-center among the K sub-center representations and compute its similarity with the input feature. Then, we determine its corresponding class label. If the class label equals N+1, we consider the need for further classification at the next level. So HS-Arcface introduces an additional clustering center(N+1) as a marker for the following layer classification. If $i$ is less than K, it means we have not reached the final layer, so we increment $i$ by one and continue the loop. Finally, we apply the ArcFace operation to the output.
  • ...and 2 more figures