Table of Contents
Fetching ...

Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

Emadeldeen Hamdan, Hongyi Pan, Ahmet Enis Cetin

TL;DR

This work introduces the concept of controllability and observability to the original Mamba SSM's architecture in the authors' Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications and reinforces stability on the A matrix in Mamba2 to improve the loss and perplexity of the model.

Abstract

Structured state space models' (SSMs) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models at small to medium scale. In this work, we introduce the concept of controllability and observability to the original Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. Moreover, we reinforce stability on the $nxn$ $A$ matrix on Mmaba2. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the $A$, $B$, $C$, and $D$ matrices at each time step, leading to increased complexity and computational costs. Furthermore, the $A$ matrix in Mamba2 is not always stable. We demonstrate a reduction of parameters compared to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. We further enforce stability on the $A$ matrix in Mamba2 to improve the loss and perplexity of the model. The controllable and stable $n \times n$ state matrix $A$ is sparse, and it has only $n$ free parameters. Our novel approach will ensure controllable/observable and stable SSMs, which will be the gate key for Mamba3.

Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

TL;DR

This work introduces the concept of controllability and observability to the original Mamba SSM's architecture in the authors' Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications and reinforces stability on the A matrix in Mamba2 to improve the loss and perplexity of the model.

Abstract

Structured state space models' (SSMs) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models at small to medium scale. In this work, we introduce the concept of controllability and observability to the original Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. Moreover, we reinforce stability on the matrix on Mmaba2. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the , , , and matrices at each time step, leading to increased complexity and computational costs. Furthermore, the matrix in Mamba2 is not always stable. We demonstrate a reduction of parameters compared to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. We further enforce stability on the matrix in Mamba2 to improve the loss and perplexity of the model. The controllable and stable state matrix is sparse, and it has only free parameters. Our novel approach will ensure controllable/observable and stable SSMs, which will be the gate key for Mamba3.
Paper Structure (15 sections, 25 equations, 2 figures, 4 tables)

This paper contains 15 sections, 25 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Block diagram analysis of controllable canonical form (CCF).
  • Figure 2: Block diagram analysis of observable canonical form (OCF).