Table of Contents
Fetching ...

Neural Architecture Search based Global-local Vision Mamba for Palm-Vein Recognition

Huafeng Qin, Yuming Fu, Jing Chen, Mounim A. El-Yacoubi, Xinbo Gao, Feng Xi

TL;DR

A hybrid network structure named Global-local Vision Mamba (GLVM) is proposed, to learn the local correlations in images explicitly and global dependencies among tokens for vein feature representation and a Globallocal Alternate Neural Architecture Search method is proposed to search the optimal architecture of GLVM alternately with the evolutionary algorithm, thereby improving the recognition performance for vein recognition tasks.

Abstract

Due to the advantages such as high security, high privacy, and liveness recognition, vein recognition has been received more and more attention in past years. Recently, deep learning models, e.g., Mamba has shown robust feature representation with linear computational complexity and successfully applied for visual tasks. However, vision Manba can capture long-distance feature dependencies but unfortunately deteriorate local feature details. Besides, manually designing a Mamba architecture based on human priori knowledge is very time-consuming and error-prone. In this paper, first, we propose a hybrid network structure named Global-local Vision Mamba (GLVM), to learn the local correlations in images explicitly and global dependencies among tokens for vein feature representation. Secondly, we design a Multi-head Mamba to learn the dependencies along different directions, so as to improve the feature representation ability of vision Mamba. Thirdly, to learn the complementary features, we propose a ConvMamba block consisting of three branches, named Multi-head Mamba branch (MHMamba), Feature Iteration Unit branch (FIU), and Convolutional Neural Network (CNN) branch, where the Feature Iteration Unit branch aims to fuse convolutional local features with Mamba-based global representations. Finally, a Globallocal Alternate Neural Architecture Search (GLNAS) method is proposed to search the optimal architecture of GLVM alternately with the evolutionary algorithm, thereby improving the recognition performance for vein recognition tasks. We conduct rigorous experiments on three public palm-vein databases to estimate the performance. The experimental results demonstrate that the proposed method outperforms the representative approaches and achieves state-of-the-art recognition accuracy.

Neural Architecture Search based Global-local Vision Mamba for Palm-Vein Recognition

TL;DR

A hybrid network structure named Global-local Vision Mamba (GLVM) is proposed, to learn the local correlations in images explicitly and global dependencies among tokens for vein feature representation and a Globallocal Alternate Neural Architecture Search method is proposed to search the optimal architecture of GLVM alternately with the evolutionary algorithm, thereby improving the recognition performance for vein recognition tasks.

Abstract

Due to the advantages such as high security, high privacy, and liveness recognition, vein recognition has been received more and more attention in past years. Recently, deep learning models, e.g., Mamba has shown robust feature representation with linear computational complexity and successfully applied for visual tasks. However, vision Manba can capture long-distance feature dependencies but unfortunately deteriorate local feature details. Besides, manually designing a Mamba architecture based on human priori knowledge is very time-consuming and error-prone. In this paper, first, we propose a hybrid network structure named Global-local Vision Mamba (GLVM), to learn the local correlations in images explicitly and global dependencies among tokens for vein feature representation. Secondly, we design a Multi-head Mamba to learn the dependencies along different directions, so as to improve the feature representation ability of vision Mamba. Thirdly, to learn the complementary features, we propose a ConvMamba block consisting of three branches, named Multi-head Mamba branch (MHMamba), Feature Iteration Unit branch (FIU), and Convolutional Neural Network (CNN) branch, where the Feature Iteration Unit branch aims to fuse convolutional local features with Mamba-based global representations. Finally, a Globallocal Alternate Neural Architecture Search (GLNAS) method is proposed to search the optimal architecture of GLVM alternately with the evolutionary algorithm, thereby improving the recognition performance for vein recognition tasks. We conduct rigorous experiments on three public palm-vein databases to estimate the performance. The experimental results demonstrate that the proposed method outperforms the representative approaches and achieves state-of-the-art recognition accuracy.
Paper Structure (35 sections, 19 equations, 8 figures, 6 tables)

This paper contains 35 sections, 19 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: The framework of the proposed Global-local Vision Mamba (GLVM). The Global-local Vision Mamba employs a dual-branch hybrid architecture which consists of a stem block, a patch embedding layer, $N$ ConvMamba blocks, and two classifiers.
  • Figure 2: The architecture of ConvMamba block. (a) Each ConvMamba block consists of a MHMamba block, a Feature Interaction Unit, and two sub-convolution blocks, each of which includes $3$ convolutional layers followed by the BatchNorm regularization and ReLU activation. (b) The basic operations in a Feature Interaction Unit, e.g., Flatten and Averagepool.
  • Figure 3: MHMamba block.(a) The detailed architecture of our MHMamba block, (b) Mamba module in (a), and (c) Multi-direction Scanning (MDS) mechanism in (a), which includes multiple scanning directions, e.g., vertical, flipped vertical, horizontal, and flipped vertical.
  • Figure 4: Framework of the GLANAS. We split the search space into two parts, global space and local space, and alternately perform a two-stage search, with each stage of the search employing a weight entanglement strategy to sample subnets from the supernet.
  • Figure 5: Preprocessing results on three datasets. Original palm image and ROI on (a) TJU_PV dataset, (b) HKPU_PV dataset, and (c) VERA_PV dataset.
  • ...and 3 more figures