Shape-centered Representation Learning for Visible-Infrared Person Re-identification
Shuang Li, Jiaxu Leng, Ji Gan, Mengjingcheng Mo, Xinbo Gao
TL;DR
This work tackles the challenges of visible-infrared person re-identification (VI-ReID) by moving beyond appearance-only representations to exploit body shape as a modality-robust cue. It introduces Shape-centered Representation Learning (ScRL), which combines Infrared Shape Restoration (ISR), Shape Feature Propagation (SFP), and Appearance Feature Enhancement (AFE) to fuse shape and appearance features in a two-branch architecture. ISR restores infrared shapes at the feature level, SFP transfers shape-discriminative ability into the appearance stream for efficient inference, and AFE emphasizes shape-related appearance features through a two-stage cross-attention mechanism. Extensive experiments on SYSU-MM01, RegDB, and HITSZ-VCM demonstrate state-of-the-art VI-ReID performance with favorable accuracy and efficiency, highlighting the practical value of shape-centered representations for cross-modality person identification.
Abstract
Visible-Infrared Person Re-Identification (VI-ReID) plays a critical role in all-day surveillance systems. However, existing methods primarily focus on learning appearance features while overlooking body shape features, which not only complement appearance features but also exhibit inherent robustness to modality variations. Despite their potential, effectively integrating shape and appearance features remains challenging. Appearance features are highly susceptible to modality variations and background noise, while shape features often suffer from inaccurate infrared shape estimation due to the limitations of auxiliary models. To address these challenges, we propose the Shape-centered Representation Learning (ScRL) framework, which enhances VI-ReID performance by innovatively integrating shape and appearance features. Specifically, we introduce Infrared Shape Restoration (ISR) to restore inaccuracies in infrared body shape representations at the feature level by leveraging infrared appearance features. In addition, we propose Shape Feature Propagation (SFP), which enables the direct extraction of shape features from original images during inference with minimal computational complexity. Furthermore, we design Appearance Feature Enhancement (AFE), which utilizes shape features to emphasize shape-related appearance features while effectively suppressing identity-unrelated noise. Benefiting from the effective integration of shape and appearance features, ScRL demonstrates superior performance through extensive experiments. On the SYSU-MM01, HITSZ-VCM, and RegDB datasets, it achieves Rank-1 (mAP) accuracies of 76.1% (72.6%), 71.2% (52.9%), and 92.4% (86.7%), respectively, surpassing existing state-of-the-art methods.
