Table of Contents
Fetching ...

Shape-centered Representation Learning for Visible-Infrared Person Re-identification

Shuang Li, Jiaxu Leng, Ji Gan, Mengjingcheng Mo, Xinbo Gao

TL;DR

This work tackles the challenges of visible-infrared person re-identification (VI-ReID) by moving beyond appearance-only representations to exploit body shape as a modality-robust cue. It introduces Shape-centered Representation Learning (ScRL), which combines Infrared Shape Restoration (ISR), Shape Feature Propagation (SFP), and Appearance Feature Enhancement (AFE) to fuse shape and appearance features in a two-branch architecture. ISR restores infrared shapes at the feature level, SFP transfers shape-discriminative ability into the appearance stream for efficient inference, and AFE emphasizes shape-related appearance features through a two-stage cross-attention mechanism. Extensive experiments on SYSU-MM01, RegDB, and HITSZ-VCM demonstrate state-of-the-art VI-ReID performance with favorable accuracy and efficiency, highlighting the practical value of shape-centered representations for cross-modality person identification.

Abstract

Visible-Infrared Person Re-Identification (VI-ReID) plays a critical role in all-day surveillance systems. However, existing methods primarily focus on learning appearance features while overlooking body shape features, which not only complement appearance features but also exhibit inherent robustness to modality variations. Despite their potential, effectively integrating shape and appearance features remains challenging. Appearance features are highly susceptible to modality variations and background noise, while shape features often suffer from inaccurate infrared shape estimation due to the limitations of auxiliary models. To address these challenges, we propose the Shape-centered Representation Learning (ScRL) framework, which enhances VI-ReID performance by innovatively integrating shape and appearance features. Specifically, we introduce Infrared Shape Restoration (ISR) to restore inaccuracies in infrared body shape representations at the feature level by leveraging infrared appearance features. In addition, we propose Shape Feature Propagation (SFP), which enables the direct extraction of shape features from original images during inference with minimal computational complexity. Furthermore, we design Appearance Feature Enhancement (AFE), which utilizes shape features to emphasize shape-related appearance features while effectively suppressing identity-unrelated noise. Benefiting from the effective integration of shape and appearance features, ScRL demonstrates superior performance through extensive experiments. On the SYSU-MM01, HITSZ-VCM, and RegDB datasets, it achieves Rank-1 (mAP) accuracies of 76.1% (72.6%), 71.2% (52.9%), and 92.4% (86.7%), respectively, surpassing existing state-of-the-art methods.

Shape-centered Representation Learning for Visible-Infrared Person Re-identification

TL;DR

This work tackles the challenges of visible-infrared person re-identification (VI-ReID) by moving beyond appearance-only representations to exploit body shape as a modality-robust cue. It introduces Shape-centered Representation Learning (ScRL), which combines Infrared Shape Restoration (ISR), Shape Feature Propagation (SFP), and Appearance Feature Enhancement (AFE) to fuse shape and appearance features in a two-branch architecture. ISR restores infrared shapes at the feature level, SFP transfers shape-discriminative ability into the appearance stream for efficient inference, and AFE emphasizes shape-related appearance features through a two-stage cross-attention mechanism. Extensive experiments on SYSU-MM01, RegDB, and HITSZ-VCM demonstrate state-of-the-art VI-ReID performance with favorable accuracy and efficiency, highlighting the practical value of shape-centered representations for cross-modality person identification.

Abstract

Visible-Infrared Person Re-Identification (VI-ReID) plays a critical role in all-day surveillance systems. However, existing methods primarily focus on learning appearance features while overlooking body shape features, which not only complement appearance features but also exhibit inherent robustness to modality variations. Despite their potential, effectively integrating shape and appearance features remains challenging. Appearance features are highly susceptible to modality variations and background noise, while shape features often suffer from inaccurate infrared shape estimation due to the limitations of auxiliary models. To address these challenges, we propose the Shape-centered Representation Learning (ScRL) framework, which enhances VI-ReID performance by innovatively integrating shape and appearance features. Specifically, we introduce Infrared Shape Restoration (ISR) to restore inaccuracies in infrared body shape representations at the feature level by leveraging infrared appearance features. In addition, we propose Shape Feature Propagation (SFP), which enables the direct extraction of shape features from original images during inference with minimal computational complexity. Furthermore, we design Appearance Feature Enhancement (AFE), which utilizes shape features to emphasize shape-related appearance features while effectively suppressing identity-unrelated noise. Benefiting from the effective integration of shape and appearance features, ScRL demonstrates superior performance through extensive experiments. On the SYSU-MM01, HITSZ-VCM, and RegDB datasets, it achieves Rank-1 (mAP) accuracies of 76.1% (72.6%), 71.2% (52.9%), and 92.4% (86.7%), respectively, surpassing existing state-of-the-art methods.
Paper Structure (19 sections, 18 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 19 sections, 18 equations, 7 figures, 12 tables, 1 algorithm.

Figures (7)

  • Figure 1: The visible (infrared) images and their corresponding body shapes and the orange box indicate an incorrect area of the infrared body shape.
  • Figure 2: Framework comparison of VI-ReID methods that explore the utilization of body shape. (a) Learning the features associated with shapes through multi-task learning in CMMTLhuang2022cross. (b) Learning diverse appearance features through decoupling and discarding shape features in SEFLfeng2023shape. (c) Learning shape features and enhancing appearance features through shape features.
  • Figure 3: The pipeline of our proposed ScRL framework consists of two branches: the shape stream and the appearance stream. The shape stream includes the shape feature learning network $\bm E_{s}$ and Infrared Shape Restoration (ISR). $\bm E_{s}$ encodes the input shape into shape features $\hat{\bm F}_{s,i}$, with ISR applied at an intermediate stage to restore erroneous infrared shape features by leveraging appearance features ${\bm F}^{ir,1}_{i}$ and ${\bm F}^{ir,2}_{i}$. The appearance stream comprises the appearance feature learning network $\bm E_{a}$, the shape sub-network $\bm E_{\tilde{s}}$, and Appearance Feature Enhancement (AFE). $\bm E_{a}$ encodes the pedestrian image into appearance features $\bm F_{i}$. To improve inference efficiency, $\bm E_{\tilde{s}}$ encodes shape features $\Bar{\bm F}_{s,i}$ from the third block of $\bm E_{a}$, guided by Shape Feature Propagation (SFP). Finally, AFE employs a cascaded two-stage cross-attention mechanism, enhancing the interaction between $\Bar{\bm F}_{s,i}$ and $\bm F_{i}$, which results in shape-centered pedestrian feature representations.
  • Figure 4: Illustration of the proposed ISR that is used to obtain missing shape features from appearance features for restoring IR shapes in feature level.
  • Figure 5: Illustration of the proposed AFE, which can mine shape-centered appearance features guided by shape features.
  • ...and 2 more figures