Table of Contents
Fetching ...

Masked Attribute Description Embedding for Cloth-Changing Person Re-identification

Chunlei Peng, Boyu Wang, Decheng Liu, Nannan Wang, Ruimin Hu, Xinbo Gao

TL;DR

This paper tackles cloth-changing person re-identification by leveraging cloth-insensitive information derived from editable attribute descriptions. It introduces Masked Attribute Description Embedding (MADE), which masks cloth-related attributes extracted by SOLIDER and embeds the resulting masked descriptor into a Transformer-based backbone to fuse with image features across multiple levels. The method is trained with a combination of cross-entropy and triplet losses and evaluated on four benchmarks (PRCC, LTCC, Celeb-reID-light, LaST), where it achieves state-of-the-art results and shows robustness to attribute-detection noise. Overall, MADE demonstrates that editable, cloth-irrelevant attribute information can significantly enhance cloth-changing ReID while avoiding complex multi-modal encoders, with potential for extension to other cross-modality tasks.

Abstract

Cloth-changing person re-identification (CC-ReID) aims to match persons who change clothes over long periods. The key challenge in CC-ReID is to extract clothing-independent features, such as face, hairstyle, body shape, and gait. Current research mainly focuses on modeling body shape using multi-modal biological features (such as silhouettes and sketches). However, it does not fully leverage the personal description information hidden in the original RGB image. Considering that there are certain attribute descriptions which remain unchanged after the changing of cloth, we propose a Masked Attribute Description Embedding (MADE) method that unifies personal visual appearance and attribute description for CC-ReID. Specifically, handling variable clothing-sensitive information, such as color and type, is challenging for effective modeling. To address this, we mask the clothing and color information in the personal attribute description extracted through an attribute detection model. The masked attribute description is then connected and embedded into Transformer blocks at various levels, fusing it with the low-level to high-level features of the image. This approach compels the model to discard clothing information. Experiments are conducted on several CC-ReID benchmarks, including PRCC, LTCC, Celeb-reID-light, and LaST. Results demonstrate that MADE effectively utilizes attribute description, enhancing cloth-changing person re-identification performance, and compares favorably with state-of-the-art methods. The code is available at https://github.com/moon-wh/MADE.

Masked Attribute Description Embedding for Cloth-Changing Person Re-identification

TL;DR

This paper tackles cloth-changing person re-identification by leveraging cloth-insensitive information derived from editable attribute descriptions. It introduces Masked Attribute Description Embedding (MADE), which masks cloth-related attributes extracted by SOLIDER and embeds the resulting masked descriptor into a Transformer-based backbone to fuse with image features across multiple levels. The method is trained with a combination of cross-entropy and triplet losses and evaluated on four benchmarks (PRCC, LTCC, Celeb-reID-light, LaST), where it achieves state-of-the-art results and shows robustness to attribute-detection noise. Overall, MADE demonstrates that editable, cloth-irrelevant attribute information can significantly enhance cloth-changing ReID while avoiding complex multi-modal encoders, with potential for extension to other cross-modality tasks.

Abstract

Cloth-changing person re-identification (CC-ReID) aims to match persons who change clothes over long periods. The key challenge in CC-ReID is to extract clothing-independent features, such as face, hairstyle, body shape, and gait. Current research mainly focuses on modeling body shape using multi-modal biological features (such as silhouettes and sketches). However, it does not fully leverage the personal description information hidden in the original RGB image. Considering that there are certain attribute descriptions which remain unchanged after the changing of cloth, we propose a Masked Attribute Description Embedding (MADE) method that unifies personal visual appearance and attribute description for CC-ReID. Specifically, handling variable clothing-sensitive information, such as color and type, is challenging for effective modeling. To address this, we mask the clothing and color information in the personal attribute description extracted through an attribute detection model. The masked attribute description is then connected and embedded into Transformer blocks at various levels, fusing it with the low-level to high-level features of the image. This approach compels the model to discard clothing information. Experiments are conducted on several CC-ReID benchmarks, including PRCC, LTCC, Celeb-reID-light, and LaST. Results demonstrate that MADE effectively utilizes attribute description, enhancing cloth-changing person re-identification performance, and compares favorably with state-of-the-art methods. The code is available at https://github.com/moon-wh/MADE.
Paper Structure (20 sections, 3 equations, 6 figures, 7 tables)

This paper contains 20 sections, 3 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: An illustration of cloth-changing person re-identification over a long period of time and across cameras. The attributes of the person image is shown in the figure. Attributes related to clothes are marked in black, while attributes irrelevant to clothes are marked in blue. In the cloth-changing person re-identification scenario, many attributes unrelated to clothes remain consistent, such as hair, glasses, shoes, age, and gender, which could be useful for re-identification.
  • Figure 2: The framework of Masked Attribute Description Embedding (MADE) method. We first extract editable attribute description from the image through Description Extraction and Mask (DEM) module. After the cloth-related attribute descriptions are masked and converted into a binary vector, it is connected and embedded at different levels through Linear Projection to fuse with image features. Finally, we aggregate $f_{cls}^{v}$, $f_{des,2}^{m}$ and $f_{des,3}^{m}$ through Conv1D to obtain the person feature representation
  • Figure 3: Examples of pedestrian attribute lists extracted using SOLIDER (Attributes related to clothes are marked in black, while attributes unrelated to clothes are marked in blue).
  • Figure 4: Examples of the datasets this paper used.
  • Figure 5: Retain ratio of clothes irrelevant person attributes in each dataset. (a) PRCC, (b) LTCC, (c) Celeb-reID-light, (d) LaST, and (e) Average statistic.
  • ...and 1 more figures