Table of Contents
Fetching ...

Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba

Donghyun Lee, Yuhang Li, Ruokai Yin, Shiting Xiao, Priyadarshini Panda

TL;DR

Memba is proposed, a membrane-driven PEFT approach specifically designed for Mamba that introduces Leaky Integrate Membrane neurons as bio-inspired gating mechanisms that naturally accumulate membrane potentials over time, enhancing selective information retention.

Abstract

State Space Models (SSMs) have emerged as powerful alternatives to attention-based Transformers, with Mamba demonstrating impressive efficiency and scalability. As these models grow increasingly larger, the need for Parameter-Efficient Fine-Tuning (PEFT) methods becomes critical to adapt pre-trained Mamba to downstream tasks without prohibitive computational costs. However, previous approaches simply apply traditional Transformer-tailored PEFT methods without addressing the unique temporal processing dynamics of SSMs. To address this limitation, we propose Memba, a membrane-driven PEFT approach specifically designed for Mamba. Memba introduces Leaky Integrate Membrane (LIM) neurons as bio-inspired gating mechanisms that naturally accumulate membrane potentials over time, enhancing selective information retention. By strategically combining LIM neurons with Low-Rank Adaptations (LoRA) and cross-layer membrane transfer, our approach significantly improves Mamba's temporal modeling capabilities. Extensive experiments across language and vision tasks demonstrate that Memba achieves substantial improvements over existing PEFT methods. The code is available at https://github.com/Intelligent-Computing-Lab-Yale/Memba.

Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba

TL;DR

Memba is proposed, a membrane-driven PEFT approach specifically designed for Mamba that introduces Leaky Integrate Membrane neurons as bio-inspired gating mechanisms that naturally accumulate membrane potentials over time, enhancing selective information retention.

Abstract

State Space Models (SSMs) have emerged as powerful alternatives to attention-based Transformers, with Mamba demonstrating impressive efficiency and scalability. As these models grow increasingly larger, the need for Parameter-Efficient Fine-Tuning (PEFT) methods becomes critical to adapt pre-trained Mamba to downstream tasks without prohibitive computational costs. However, previous approaches simply apply traditional Transformer-tailored PEFT methods without addressing the unique temporal processing dynamics of SSMs. To address this limitation, we propose Memba, a membrane-driven PEFT approach specifically designed for Mamba. Memba introduces Leaky Integrate Membrane (LIM) neurons as bio-inspired gating mechanisms that naturally accumulate membrane potentials over time, enhancing selective information retention. By strategically combining LIM neurons with Low-Rank Adaptations (LoRA) and cross-layer membrane transfer, our approach significantly improves Mamba's temporal modeling capabilities. Extensive experiments across language and vision tasks demonstrate that Memba achieves substantial improvements over existing PEFT methods. The code is available at https://github.com/Intelligent-Computing-Lab-Yale/Memba.

Paper Structure

This paper contains 37 sections, 1 theorem, 33 equations, 9 figures, 13 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{L}(\mathbf{y})$ be a twice-differentiable loss function, and let $\mathbf{y}_t = f_{\theta}(\mathbf{X}_t)$ be the output of a standard Mamba block. When augmented with our LIM mechanism, the effective output becomes $\hat{\mathbf{y}}_t = \mathbf{y}_t \odot g(\mathbf{u}_t)$, where $\mat where $\bar{\mathbf{u}}_t = \mathbb{E}[\mathbf{u}_t]$, $\boldsymbol{\varepsilon}_t = \mathbf{u}_t -

Figures (9)

  • Figure 1: Overview of Membaarchitecture and performance comparison. (a) Architecture and saliency map comparison between original SSM and Membaon a Pathfinder dataset image. The pink lines in architectures represent gating branches, and the green dashed circle indicates the target path to be identified. (b) Performance comparison on language (commonsense reasoning) and vision (VTAB-1k) tasks using Mamba-790M and Vim-S architectures respectively. We compare Membawith SLL LoRA, Additional-scan, Affix-tuning, and LoRA in yoshimura2024mambapeft.
  • Figure 1: Ablation study on the impact of applying LoRA to different projection components in Memba-130M.
  • Figure 2: Overview of Membaarchitecture. On top of original Mamba architecture including embedding, normalization, linear layers, and SSM, our Membais designed with ① Leaky Integrate Membrane (LIM), ② Low-Rank Adaptations (LoRAs) on input and output projection, and ③ membrane transfer across layers.
  • Figure 3: Overview of Leaky Integrate Membrane (LIM). Each token chunk is processed with LIM dynamics, and membrane outputs are concatenated to form the final sequence representation. In this figure, the input contains $L=8$ tokens split into $T=4$ chunks, with each chunk ($X_1, X_2, X_3, X_4$) containing 2 tokens.
  • Figure 4: Membrane-driven temporal processing in Memba. (a) The input image is divided into spatial chunks and flattened into a sequential representation. (b) Membrane distribution and (c) saliency map through the LIM neuron show how the LIM neuron tracks main features while progressively decreasing baseline potentials across chunks, demonstrating adaptive temporal attention. We provide more visualizations in Appendix \ref{['visual']}.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Theorem 1