Table of Contents
Fetching ...

ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification

Hongyang Gu, Qisong Yang, Lei Pu, Siming Han, Yao Ding

TL;DR

ReIDMamba addresses the scalability and memory bottlenecks of Transformer-based person re-identification by introducing a pure Mamba-based framework. It integrates a multi-granularity feature extractor (MGFE) and ranking-aware triplet regularization (RATR) to produce diverse, discriminative features across multiple granularities. The approach achieves state-of-the-art results on five ReID benchmarks with reduced parameter count and faster inference compared to Transformer counterparts. This work demonstrates the potential of Vision Mamba architectures for efficient, high-performance ReID in resource-constrained settings.

Abstract

Extracting robust discriminative features is a critical challenge in person re-identification (ReID). While Transformer-based methods have successfully addressed some limitations of convolutional neural networks (CNNs), such as their local processing nature and information loss resulting from convolution and downsampling operations, they still face the scalability issue due to the quadratic increase in memory and computational requirements with the length of the input sequence. To overcome this, we propose a pure Mamba-based person ReID framework named ReIDMamba. Specifically, we have designed a Mamba-based strong baseline that effectively leverages fine-grained, discriminative global features by introducing multiple class tokens. To further enhance robust features learning within Mamba, we have carefully designed two novel techniques. First, the multi-granularity feature extractor (MGFE) module, designed with a multi-branch architecture and class token fusion, effectively forms multi-granularity features, enhancing both discrimination ability and fine-grained coverage. Second, the ranking-aware triplet regularization (RATR) is introduced to reduce redundancy in features from multiple branches, enhancing the diversity of multi-granularity features by incorporating both intra-class and inter-class diversity constraints, thus ensuring the robustness of person features. To our knowledge, this is the pioneering work that integrates a purely Mamba-driven approach into ReID research. Our proposed ReIDMamba model boasts only one-third the parameters of TransReID, along with lower GPU memory usage and faster inference throughput. Experimental results demonstrate ReIDMamba's superior and promising performance, achieving state-of-the-art performance on five person ReID benchmarks. Code is available at https://github.com/GuHY777/ReIDMamba.

ReIDMamba: Learning Discriminative Features with Visual State Space Model for Person Re-Identification

TL;DR

ReIDMamba addresses the scalability and memory bottlenecks of Transformer-based person re-identification by introducing a pure Mamba-based framework. It integrates a multi-granularity feature extractor (MGFE) and ranking-aware triplet regularization (RATR) to produce diverse, discriminative features across multiple granularities. The approach achieves state-of-the-art results on five ReID benchmarks with reduced parameter count and faster inference compared to Transformer counterparts. This work demonstrates the potential of Vision Mamba architectures for efficient, high-performance ReID in resource-constrained settings.

Abstract

Extracting robust discriminative features is a critical challenge in person re-identification (ReID). While Transformer-based methods have successfully addressed some limitations of convolutional neural networks (CNNs), such as their local processing nature and information loss resulting from convolution and downsampling operations, they still face the scalability issue due to the quadratic increase in memory and computational requirements with the length of the input sequence. To overcome this, we propose a pure Mamba-based person ReID framework named ReIDMamba. Specifically, we have designed a Mamba-based strong baseline that effectively leverages fine-grained, discriminative global features by introducing multiple class tokens. To further enhance robust features learning within Mamba, we have carefully designed two novel techniques. First, the multi-granularity feature extractor (MGFE) module, designed with a multi-branch architecture and class token fusion, effectively forms multi-granularity features, enhancing both discrimination ability and fine-grained coverage. Second, the ranking-aware triplet regularization (RATR) is introduced to reduce redundancy in features from multiple branches, enhancing the diversity of multi-granularity features by incorporating both intra-class and inter-class diversity constraints, thus ensuring the robustness of person features. To our knowledge, this is the pioneering work that integrates a purely Mamba-driven approach into ReID research. Our proposed ReIDMamba model boasts only one-third the parameters of TransReID, along with lower GPU memory usage and faster inference throughput. Experimental results demonstrate ReIDMamba's superior and promising performance, achieving state-of-the-art performance on five person ReID benchmarks. Code is available at https://github.com/GuHY777/ReIDMamba.

Paper Structure

This paper contains 15 sections, 14 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Performance and efficiency comparisons between TransReID and our ReIDMamba model regarding throughput (a) and GPU memory (b).
  • Figure 2: Mamba-based strong baseline framework (a nonoverlapping partition is shown, $M=4,N=32$). $M$ class tokens are evenly distributed among $N$ image tokens to extract fine-grained global person features. Subsequently, the $M$ class tokens are concatenated to form the final features used for ReID.
  • Figure 3: The overall architecture of ReIDMamba (a nonoverlapping partition is shown, $M=4,N=32,G=3$). (a) ReIDMamba, based on the Mamba-based strong baseline, incorporates a multi-branch architecture to extract multi-granularity features from multiple levels. The core block of ReIDMamba is the (b) bi-directional Mamba block within Vim. To ensure that each branch inherently maintains a multiple level of granularity, (c) multi-granularity feature extractor is used for tokens fusion. Finally, (d) ranking-aware triplet regularization is combined to enhance the diversity of features across various granularities.
  • Figure 4: Comparison of different regularizations. (a) The features from the two branches are completely orthogonal, yet their relative similarity is identical, failing to achieve the goal of increasing diversity. (b) Ranking-aware triplet regularization enhances the diversity of the two features from both intra-class and inter-class perspectives.
  • Figure 5: Comparison of different tokens fusion operations on (a) MSMT17 and (b) DukeMTMC-reID datasets.
  • ...and 3 more figures