Table of Contents
Fetching ...

Controllability Analysis of State Space-based Language Model

Mohamed Mabrok, Yalda Zafari

TL;DR

The paper presents the Influence Score, a principled, controllability-based metric derived from Mamba's state-space parameters to quantify how input tokens steer future states and outputs in SSM-based language models. It derives the score via a backward recurrence that combines direct and propagated influences, and validates it across three Mamba variants with a six-experiment suite, revealing scaling laws, recency bias, and mid-to-late layer specialization. The findings show that Influence Score grows with model size and data quality, exhibits robust behavior at scale, and provides a concrete diagnostic to interpret and compare SSM-based LLMs, with potential implications for prompt design and stability. Overall, the method offers a theoretically grounded, efficient alternative to gradient- or perturbation-based interpretability approaches for SSM architectures.

Abstract

State-space models (SSMs), particularly Mamba, have become powerful architectures for sequence modeling, yet their internal dynamics remain poorly understood compared to attention-based models. We introduce and validate the Influence Score, a controllability-based metric derived from the discretized state-space parameters of Mamba and computed through a backward recurrence analogous to system observability. The score quantifies how strongly a token at position k affects all later states and outputs. We evaluate this measure across three Mamba variants: mamba-130m, mamba-2.8b, and mamba-2.8b-slimpj, using six experiments that test its sensitivity to temperature, prompt complexity, token type, layer depth, token position, and input perturbations. The results show three main insights: (1) the Influence Score increases with model size and training data, reflecting model capacity; (2) Mamba exhibits consistent architectural patterns, including recency bias and concentrated influence in mid-to-late layers; and (3) emergent behaviors appear only at scale, with mamba-2.8b-slimpj uniquely prioritizing content words and reducing internal influence in the presence of noise. These findings establish the Influence Score as a practical diagnostic tool for interpreting and comparing SSM-based language models.

Controllability Analysis of State Space-based Language Model

TL;DR

The paper presents the Influence Score, a principled, controllability-based metric derived from Mamba's state-space parameters to quantify how input tokens steer future states and outputs in SSM-based language models. It derives the score via a backward recurrence that combines direct and propagated influences, and validates it across three Mamba variants with a six-experiment suite, revealing scaling laws, recency bias, and mid-to-late layer specialization. The findings show that Influence Score grows with model size and data quality, exhibits robust behavior at scale, and provides a concrete diagnostic to interpret and compare SSM-based LLMs, with potential implications for prompt design and stability. Overall, the method offers a theoretically grounded, efficient alternative to gradient- or perturbation-based interpretability approaches for SSM architectures.

Abstract

State-space models (SSMs), particularly Mamba, have become powerful architectures for sequence modeling, yet their internal dynamics remain poorly understood compared to attention-based models. We introduce and validate the Influence Score, a controllability-based metric derived from the discretized state-space parameters of Mamba and computed through a backward recurrence analogous to system observability. The score quantifies how strongly a token at position k affects all later states and outputs. We evaluate this measure across three Mamba variants: mamba-130m, mamba-2.8b, and mamba-2.8b-slimpj, using six experiments that test its sensitivity to temperature, prompt complexity, token type, layer depth, token position, and input perturbations. The results show three main insights: (1) the Influence Score increases with model size and training data, reflecting model capacity; (2) Mamba exhibits consistent architectural patterns, including recency bias and concentrated influence in mid-to-late layers; and (3) emergent behaviors appear only at scale, with mamba-2.8b-slimpj uniquely prioritizing content words and reducing internal influence in the presence of noise. These findings establish the Influence Score as a practical diagnostic tool for interpreting and comparing SSM-based language models.

Paper Structure

This paper contains 24 sections, 14 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Suite of Experiments for Influence Score Evaluation. For each experimental condition, we report the mean and standard deviation of the Influence Score, the coefficient of variation (CV) as a stability indicator, and the spearman correlation coefficients for monotonic trends (e.g., temperature vs. influence).
  • Figure 2: Cross-model comparison of mean Influence Scores across all six experiment categories.
  • Figure 3: Temperature sensitivity for the Mamba-2.8B-SlimPJ model. Each line represents mean Influence Score per temperature setting.
  • Figure 4: Prompt complexity analysis: influence distribution across prompt types.
  • Figure 5: Token-type influence comparison across models. Content tokens maintain consistently higher controllability.
  • ...and 5 more figures