Table of Contents
Fetching ...

Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation

Yehjin Shin, Seojin Kim, Noseong Park

Abstract

State-space models (SSMs) offer efficient alternatives to attention with linear-time recurrence. Mamba2, a recent SSM-based language model, uses selective input gating and a multi-head structure, enabling parallel computation and strong benchmark performance. However, its multi-head recurrence operates independently without structured utilization or analysis. In this work, we propose a novel method called Hierarchical ADaptive filter bank for Efficient SSMs (HADES), a Graph Signal Processing (GSP)-inspired framework that reinterprets Mamba2 as an adaptive filter bank on a line graph. Our hierarchical architecture introduces two filter types: shared filters for global low-pass behavior and expert filters for local high-pass behavior, achieved through structured bias on the parameter Δ. HADES achieves comparable performance to baseline models including Mamba2 across various benchmarks in language modeling, commonsense reasoning, and long-context retrieval, while using only 58.9% of the original parameters. In this regard, HADES bridges GSP and neural sequence modeling, enabling efficient, hierarchical, and interpretable filtering within state-space models.

Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation

Abstract

State-space models (SSMs) offer efficient alternatives to attention with linear-time recurrence. Mamba2, a recent SSM-based language model, uses selective input gating and a multi-head structure, enabling parallel computation and strong benchmark performance. However, its multi-head recurrence operates independently without structured utilization or analysis. In this work, we propose a novel method called Hierarchical ADaptive filter bank for Efficient SSMs (HADES), a Graph Signal Processing (GSP)-inspired framework that reinterprets Mamba2 as an adaptive filter bank on a line graph. Our hierarchical architecture introduces two filter types: shared filters for global low-pass behavior and expert filters for local high-pass behavior, achieved through structured bias on the parameter Δ. HADES achieves comparable performance to baseline models including Mamba2 across various benchmarks in language modeling, commonsense reasoning, and long-context retrieval, while using only 58.9% of the original parameters. In this regard, HADES bridges GSP and neural sequence modeling, enabling efficient, hierarchical, and interpretable filtering within state-space models.
Paper Structure (77 sections, 31 equations, 12 figures, 15 tables)

This paper contains 77 sections, 31 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Distribution of layer-wise Effective Rank from the spectral responses of Mamba2 and HADES
  • Figure 2: Architectural Comparison between Mamba2 and HADES. Mamba2 applies all filters uniformly to every input token, whereas HADES employs a routing mechanism that selects and activates filters conditioned on the spectral residual $r_t$ and $\Delta_t$.
  • Figure 3: Passkey retrieval result of Mamba2 and HADES
  • Figure 4: Expert filter selection in Passkey Retrieval task
  • Figure 5: Spectrum of filter inputs and outputs from Mamba2 and HADES. The x-axis represents the Fourier frequency bins, and the y-axis shows the normalized magnitude of the Fourier coefficients, with larger values indicating stronger frequency components (see Appendix \ref{['app:spectrum']} for details).
  • ...and 7 more figures