Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

Emadeldeen Hamdan; Hongyi Pan; Ahmet Enis Cetin

Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

Emadeldeen Hamdan, Hongyi Pan, Ahmet Enis Cetin

TL;DR

This work introduces the concept of controllability and observability to the original Mamba SSM's architecture in the authors' Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications and reinforces stability on the A matrix in Mamba2 to improve the loss and perplexity of the model.

Abstract

Structured state space models' (SSMs) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models at small to medium scale. In this work, we introduce the concept of controllability and observability to the original Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. Moreover, we reinforce stability on the $nxn$ $A$ matrix on Mmaba2. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the $A$, $B$, $C$, and $D$ matrices at each time step, leading to increased complexity and computational costs. Furthermore, the $A$ matrix in Mamba2 is not always stable. We demonstrate a reduction of parameters compared to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. We further enforce stability on the $A$ matrix in Mamba2 to improve the loss and perplexity of the model. The controllable and stable $n \times n$ state matrix $A$ is sparse, and it has only $n$ free parameters. Our novel approach will ensure controllable/observable and stable SSMs, which will be the gate key for Mamba3.

Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

TL;DR

Abstract

matrix on Mmaba2. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the

, and

matrices at each time step, leading to increased complexity and computational costs. Furthermore, the

matrix in Mamba2 is not always stable. We demonstrate a reduction of parameters compared to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. We further enforce stability on the

matrix in Mamba2 to improve the loss and perplexity of the model. The controllable and stable

state matrix

is sparse, and it has only

free parameters. Our novel approach will ensure controllable/observable and stable SSMs, which will be the gate key for Mamba3.

Paper Structure (15 sections, 25 equations, 2 figures, 4 tables)

This paper contains 15 sections, 25 equations, 2 figures, 4 tables.

Introduction
Background
State Space Representations
High-Order Polynomial Projection Operator (HiPPO)
Linear State-Space Layers (LSSL)
Structured State Spaces (S4)
Mamba
Mamba
Mamba 2
Sparse Mamba Using Controllable and Observable Forms
Controllability
Observability
Stable Mamba2
Experimental Results
Conclusion

Figures (2)

Figure 1: Block diagram analysis of controllable canonical form (CCF).
Figure 2: Block diagram analysis of observable canonical form (OCF).

Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

TL;DR

Abstract

Sparse Mamba: Introducing Controllability, Observability, And Stability To Structural State Space Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)