Table of Contents
Fetching ...

A Deep Hierarchical Feature Sparse Framework for Occluded Person Re-Identification

Yihu Song, Shuaishi Liu

TL;DR

This work addresses occluded person re-identification with a focus on real-time inference. It introduces SUReID, a lightweight vision-transformer framework that uses hierarchical token sparsification to prune uninformative tokens, reducing self-attention cost while mitigating occlusion interference. To preserve discriminative representations, SUReID employs non-parametric feature alignment knowledge distillation from a pretrained teacher and augments training with realistic occluders via noise occlusion data augmentation. Experimental results across occluded and holistic ReID datasets demonstrate strong accuracy and significantly faster inference compared to pose- or parse-based baselines, with ablations confirming the positive impact of HTS, NPKD, and NODA on performance. The approach offers a practical, scalable solution for robust ReID in real-world, occlusion-heavy environments.

Abstract

Most existing methods tackle the problem of occluded person re-identification (ReID) by utilizing auxiliary models, resulting in a complicated and inefficient ReID framework that is unacceptable for real-time applications. In this work, a speed-up person ReID framework named SUReID is proposed to mitigate occlusion interference while speeding up inference. The SUReID consists of three key components: hierarchical token sparsification (HTS) strategy, non-parametric feature alignment knowledge distillation (NPKD), and noise occlusion data augmentation (NODA). The HTS strategy works by pruning the redundant tokens in the vision transformer to achieve highly effective self-attention computation and eliminate interference from occlusions or background noise. However, the pruned tokens may contain human part features that contaminate the feature representation and degrade the performance. To solve this problem, the NPKD is employed to supervise the HTS strategy, retaining more discriminative tokens and discarding meaningless ones. Furthermore, the NODA is designed to introduce more noisy samples, which further trains the ability of the HTS to disentangle different tokens. Experimental results show that the SUReID achieves superior performance with surprisingly fast inference.

A Deep Hierarchical Feature Sparse Framework for Occluded Person Re-Identification

TL;DR

This work addresses occluded person re-identification with a focus on real-time inference. It introduces SUReID, a lightweight vision-transformer framework that uses hierarchical token sparsification to prune uninformative tokens, reducing self-attention cost while mitigating occlusion interference. To preserve discriminative representations, SUReID employs non-parametric feature alignment knowledge distillation from a pretrained teacher and augments training with realistic occluders via noise occlusion data augmentation. Experimental results across occluded and holistic ReID datasets demonstrate strong accuracy and significantly faster inference compared to pose- or parse-based baselines, with ablations confirming the positive impact of HTS, NPKD, and NODA on performance. The approach offers a practical, scalable solution for robust ReID in real-world, occlusion-heavy environments.

Abstract

Most existing methods tackle the problem of occluded person re-identification (ReID) by utilizing auxiliary models, resulting in a complicated and inefficient ReID framework that is unacceptable for real-time applications. In this work, a speed-up person ReID framework named SUReID is proposed to mitigate occlusion interference while speeding up inference. The SUReID consists of three key components: hierarchical token sparsification (HTS) strategy, non-parametric feature alignment knowledge distillation (NPKD), and noise occlusion data augmentation (NODA). The HTS strategy works by pruning the redundant tokens in the vision transformer to achieve highly effective self-attention computation and eliminate interference from occlusions or background noise. However, the pruned tokens may contain human part features that contaminate the feature representation and degrade the performance. To solve this problem, the NPKD is employed to supervise the HTS strategy, retaining more discriminative tokens and discarding meaningless ones. Furthermore, the NODA is designed to introduce more noisy samples, which further trains the ability of the HTS to disentangle different tokens. Experimental results show that the SUReID achieves superior performance with surprisingly fast inference.
Paper Structure (17 sections, 14 equations, 6 figures, 7 tables)

This paper contains 17 sections, 14 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Illustration of the different methods to tackle occluded ReID task.
  • Figure 2: The pipeline of SUReID framework.
  • Figure 3: Illustration of the HTS strategy in student encoder.
  • Figure 4: Visualization results of test images from Occluded-DukeMTMC.
  • Figure 5: Illustration of the augmented pedestrians by the NODA.
  • ...and 1 more figures