Table of Contents
Fetching ...

Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba

Yuchen Zou, Yineng Chen, Zuchao Li, Lefei Zhang, Hai Zhao

TL;DR

This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering the functioning of the Mamba mechanism and its foundation on the principles of structured state space models, and the proposed improvements and the integration of Mamba with various networks.

Abstract

Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of Transformers and Mamba to compensate for each other's shortcomings. We have also made efforts to interpret Mamba and Transformer in the framework of kernel functions, allowing for a comparison of their mathematical nature within a unified context. Our paper encompasses the vast majority of improvements related to Mamba to date.

Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba

TL;DR

This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering the functioning of the Mamba mechanism and its foundation on the principles of structured state space models, and the proposed improvements and the integration of Mamba with various networks.

Abstract

Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of Transformers and Mamba to compensate for each other's shortcomings. We have also made efforts to interpret Mamba and Transformer in the framework of kernel functions, allowing for a comparison of their mathematical nature within a unified context. Our paper encompasses the vast majority of improvements related to Mamba to date.

Paper Structure

This paper contains 30 sections, 35 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: An overview of Mamba, including the developmental trajectory of structured state space models, a discussion on the substitutability of Mamba and Transformers, and their combination.
  • Figure 2: Representative works from the emergence of Mamba to the present gu2023mambapioro2024moede2024griffinlieber2024jambadao2024transsmshams2024ssambama2024uzhu2024visionliu2024vmambaruan2024vmchen2024mimpatro2024simbaHeraclesliu2024robomambayang2024vivimliang2024pointmambali2024videomambachen2024changemambaqian2024smcdgong2024nnmamba.