ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification

Hongyang Gu; Qisong Yang; Lei Pu; Siming Han; Yao Ding

ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification

Hongyang Gu, Qisong Yang, Lei Pu, Siming Han, Yao Ding

TL;DR

ReIDMamba addresses the scalability and memory bottlenecks of Transformer-based person re-identification by introducing a pure Mamba-based framework. It integrates a multi-granularity feature extractor (MGFE) and ranking-aware triplet regularization (RATR) to produce diverse, discriminative features across multiple granularities. The approach achieves state-of-the-art results on five ReID benchmarks with reduced parameter count and faster inference compared to Transformer counterparts. This work demonstrates the potential of Vision Mamba architectures for efficient, high-performance ReID in resource-constrained settings.

Abstract

Extracting robust discriminative features is a critical challenge in person re-identification (ReID). While Transformer-based methods have successfully addressed some limitations of convolutional neural networks (CNNs), such as their local processing nature and information loss resulting from convolution and downsampling operations, they still face the scalability issue due to the quadratic increase in memory and computational requirements with the length of the input sequence. To overcome this, we propose a pure Mamba-based person ReID framework named ReIDMamba. Specifically, we have designed a Mamba-based strong baseline that effectively leverages fine-grained, discriminative global features by introducing multiple class tokens. To further enhance robust features learning within Mamba, we have carefully designed two novel techniques. First, the multi-granularity feature extractor (MGFE) module, designed with a multi-branch architecture and class token fusion, effectively forms multi-granularity features, enhancing both discrimination ability and fine-grained coverage. Second, the ranking-aware triplet regularization (RATR) is introduced to reduce redundancy in features from multiple branches, enhancing the diversity of multi-granularity features by incorporating both intra-class and inter-class diversity constraints, thus ensuring the robustness of person features. To our knowledge, this is the pioneering work that integrates a purely Mamba-driven approach into ReID research. Our proposed ReIDMamba model boasts only one-third the parameters of TransReID, along with lower GPU memory usage and faster inference throughput. Experimental results demonstrate ReIDMamba's superior and promising performance, achieving state-of-the-art performance on five person ReID benchmarks. Code is available at https://github.com/GuHY777/ReIDMamba.

ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification

TL;DR

Abstract

ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)