Table of Contents
Fetching ...

Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation

Seokil Ham, Hee-Seon Kim, Sangmin Woo, Changick Kim

TL;DR

This study proposes a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonalcentric Linear Transformation (ProDiaL), which focuses on optimizing only the pretrained Projectors for new tasks through diagonal-centric linear transformation matrices, without directly fine-tuning the Projector weights.

Abstract

Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored. In our study, we introduce two key insights-driven strategies for PEFT in Mamba architecture: (1) While state-space models (SSMs) have been regarded as the cornerstone of Mamba architecture, then expected to play a primary role in transfer learning, our findings reveal that Projectors -- not SSMs -- are the predominant contributors to transfer learning. (2) Based on our observation, we propose a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonal-centric Linear Transformation (ProDiaL). ProDiaL focuses on optimizing only the pretrained Projectors for new tasks through diagonal-centric linear transformation matrices, without directly fine-tuning the Projector weights. This targeted approach allows efficient task adaptation, utilizing less than 1% of the total parameters, and exhibits strong performance across both vision and language Mamba models, highlighting its versatility and effectiveness.

Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation

TL;DR

This study proposes a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonalcentric Linear Transformation (ProDiaL), which focuses on optimizing only the pretrained Projectors for new tasks through diagonal-centric linear transformation matrices, without directly fine-tuning the Projector weights.

Abstract

Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored. In our study, we introduce two key insights-driven strategies for PEFT in Mamba architecture: (1) While state-space models (SSMs) have been regarded as the cornerstone of Mamba architecture, then expected to play a primary role in transfer learning, our findings reveal that Projectors -- not SSMs -- are the predominant contributors to transfer learning. (2) Based on our observation, we propose a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonal-centric Linear Transformation (ProDiaL). ProDiaL focuses on optimizing only the pretrained Projectors for new tasks through diagonal-centric linear transformation matrices, without directly fine-tuning the Projector weights. This targeted approach allows efficient task adaptation, utilizing less than 1% of the total parameters, and exhibits strong performance across both vision and language Mamba models, highlighting its versatility and effectiveness.

Paper Structure

This paper contains 31 sections, 7 equations, 5 figures, 16 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of Mamba Architecture and Performance Comparison. (a) The Mamba block structure, illustrating key components including the Input-Projector (In-Proj), Output-Projector (Out-Proj), and State-Space Model (SSM). (b) Performance analysis in Fine-Tuning for Vision Mamba and Mamba LLM, showing that projectors are essential for effective downstream task performance. (c) The radar chart illustrates the relative performance of our proposed method (ProDiaL) compared to other leading PEFT methods (Strong halloran2024mamba, BitFit zaken2021bitfit, LoRA hu2021lora, DoRA liu2024dora) across multiple benchmarks, demonstrating powerful performance in both vision and language tasks.
  • Figure 2: Analysis of a linear transformation matrix $T$. (a) The matrix closely resembles an identity matrix, with strong diagonal values and minimal off-diagonal values. (b) The accumulated gradient is concentrated along the diagonal, emphasizing the importance of training these elements for effective adaptation.
  • Figure 3: Overview of ProDiaL Architecture for Efficient Parameter Tuning in Mamba Models: A detailed structure of ProDiaL's approach to fine-tuning Mamba architecture by focusing on Projector transformations. ProDiaL selectively updates the diagonal($D_b$) and non-diagonal($\epsilon$) matrices in Projectors, enabling efficient learning with minimal parameters.
  • Figure 4: Performance comparison across four $T$ configurations in Mamba model. The diagonal-centric approach achieves near-optimal performance with a significantly reduced parameter number ($0.57M$), supporting the validity of our ProDiaL.
  • Figure S1: Block-Diagonal Matrix Design for ProDiaL. The diagram illustrates the block-diagonal structure of the transformation matrix $D_b$, with $r_b$ controlling the size and number of small block matrices ($x_1, ..., x_{r_b}$). As $r_b$ increases, the block size decreases.