Table of Contents
Fetching ...

Feature Completion Transformer for Occluded Person Re-identification

Tao Wang, Mengyuan Liu, Hong Liu, Wenhao Li, Miaoju Ban, Tuanyu Guo, Yidi Li

TL;DR

Occluded person re-identification is hampered by occluders that distort features and erase appearance cues. The authors develop Feature Completion Transformer (FCFormer), which uses Occlusion Instance Augmentation to create diverse occlusion samples, a dual-stream ViT-based encoder to learn paired holistic-occluded representations, and a self-supervised Feature Completion Decoder with learnable tokens to impute missing occluded information. Two novel losses, Cross Hard Triplet (CHT) and Feature Completion Consistency (FC$^2$), align features across occluded, holistic, and completed modalities and enforce distributional consistency. Across five challenging datasets, FCFormer achieves state-of-the-art results on occluded benchmarks and demonstrates robust transferability, while remaining competitive on holistic Re-ID without external cues. The approach offers a flexible, self-contained mechanism for feature completion that enhances occlusion robustness in practical surveillance scenarios.

Abstract

Occluded person re-identification (Re-ID) is a challenging problem due to the destruction of occluders. Most existing methods focus on visible human body parts through some prior information. However, when complementary occlusions occur, features in occluded regions can interfere with matching, which affects performance severely. In this paper, different from most previous works that discard the occluded region, we propose a Feature Completion Transformer (FCFormer) to implicitly complement the semantic information of occluded parts in the feature space. Specifically, Occlusion Instance Augmentation (OIA) is proposed to simulates real and diverse occlusion situations on the holistic image. These augmented images not only enrich the amount of occlusion samples in the training set, but also form pairs with the holistic images. Subsequently, a dual-stream architecture with a shared encoder is proposed to learn paired discriminative features from pairs of inputs. Without additional semantic information, an occluded-holistic feature sample-label pair can be automatically created. Then, Feature Completion Decoder (FCD) is designed to complement the features of occluded regions by using learnable tokens to aggregate possible information from self-generated occluded features. Finally, we propose the Cross Hard Triplet (CHT) loss to further bridge the gap between complementing features and extracting features under the same ID. In addition, Feature Completion Consistency (FC$^2$) loss is introduced to help the generated completion feature distribution to be closer to the real holistic feature distribution. Extensive experiments over five challenging datasets demonstrate that the proposed FCFormer achieves superior performance and outperforms the state-of-the-art methods by significant margins on occluded datasets.

Feature Completion Transformer for Occluded Person Re-identification

TL;DR

Occluded person re-identification is hampered by occluders that distort features and erase appearance cues. The authors develop Feature Completion Transformer (FCFormer), which uses Occlusion Instance Augmentation to create diverse occlusion samples, a dual-stream ViT-based encoder to learn paired holistic-occluded representations, and a self-supervised Feature Completion Decoder with learnable tokens to impute missing occluded information. Two novel losses, Cross Hard Triplet (CHT) and Feature Completion Consistency (FC), align features across occluded, holistic, and completed modalities and enforce distributional consistency. Across five challenging datasets, FCFormer achieves state-of-the-art results on occluded benchmarks and demonstrates robust transferability, while remaining competitive on holistic Re-ID without external cues. The approach offers a flexible, self-contained mechanism for feature completion that enhances occlusion robustness in practical surveillance scenarios.

Abstract

Occluded person re-identification (Re-ID) is a challenging problem due to the destruction of occluders. Most existing methods focus on visible human body parts through some prior information. However, when complementary occlusions occur, features in occluded regions can interfere with matching, which affects performance severely. In this paper, different from most previous works that discard the occluded region, we propose a Feature Completion Transformer (FCFormer) to implicitly complement the semantic information of occluded parts in the feature space. Specifically, Occlusion Instance Augmentation (OIA) is proposed to simulates real and diverse occlusion situations on the holistic image. These augmented images not only enrich the amount of occlusion samples in the training set, but also form pairs with the holistic images. Subsequently, a dual-stream architecture with a shared encoder is proposed to learn paired discriminative features from pairs of inputs. Without additional semantic information, an occluded-holistic feature sample-label pair can be automatically created. Then, Feature Completion Decoder (FCD) is designed to complement the features of occluded regions by using learnable tokens to aggregate possible information from self-generated occluded features. Finally, we propose the Cross Hard Triplet (CHT) loss to further bridge the gap between complementing features and extracting features under the same ID. In addition, Feature Completion Consistency (FC) loss is introduced to help the generated completion feature distribution to be closer to the real holistic feature distribution. Extensive experiments over five challenging datasets demonstrate that the proposed FCFormer achieves superior performance and outperforms the state-of-the-art methods by significant margins on occluded datasets.
Paper Structure (19 sections, 21 equations, 15 figures, 9 tables, 1 algorithm)

This paper contains 19 sections, 21 equations, 15 figures, 9 tables, 1 algorithm.

Figures (15)

  • Figure 1: Illustration of part/complementary occlusion and our proposed feature completion paradigm. FCFormer implicitly exploits neighboring region information by using a transformer decoder to recover missing features in occluded regions.
  • Figure 2: Examples of occluded pedestrians and introducing augmentation. (a) shows the real-world occlusion scenarios. (b) and (c) show augmentation stragy introduced by OAMNOAMN and our proposed OIA respectively.
  • Figure 3: Overall architecture of feature completion transformer (FCFormer). FCFormer consists of three model and two losses, including Occluded Instance Augmentation, Dual stream architecture, Feature Completion Stream, Cross Hard Triplet loss and Feature Completion Consistency loss. The holistic-occluded sample pairs generated from OIA are fed the into dual stream architecture with shared encoder. The non-shared parts of dual stream architecture are used to train the specifical tasks. Then FCD takes the learnable tokens and occluded features as input to recovery holistic features. We propose CHT Loss to allow the model to better perform metric learning among three different modal features (occlusion, holistic, and completion features). At last, FC$^2$ Loss is proposed to guide FCD to generate a completion feature similar enough to the holistic feature. In the test stage, the features from three branches are utilized for retrieval.
  • Figure 4: Some occlusion samples from Occlusion Instance Library.
  • Figure 5: Schematic diagram of Occlusion Instance Augmentation.
  • ...and 10 more figures