Task-Level Insights from Eigenvalues across Sequence Models
Rahel Rickenbach, Jelena Trisovic, Alexandre Didier, Jerome Sieber, Melanie N. Zeilinger
TL;DR
This work tackles the scalability bottleneck of softmax attention by analyzing eigenvalue spectra within a unified dynamical-systems framework (DSF) to compare attention-based models and linear state-space models (SSMs). By mapping masked attention and linear alternatives to a discrete-time LPV dynamical system, it links eigenvalue placement to memory and long-range dependency, revealing task-driven spectral signatures such as clusters near $1$ for long-memory tasks and near $0$ for selective memory. The empirical study across multiple benchmarks shows how architectural choices (gating, convolution, layer depth, and normalization) reshape the eigenvalue spectra and correspondingly affect performance; Mamba-2 sits between pure SSMs and attention, balancing memory and selectivity. The findings establish eigenvalue analysis as a principled, task-aware metric to guide initialization and architectural design, potentially informing spectral-aware training and model selection for long-context sequence modeling.
Abstract
Although softmax attention drives state-of-the-art performance for sequence models, its quadratic complexity limits scalability, motivating linear alternatives such as state space models (SSMs). While these alternatives improve efficiency, their fundamental differences in information processing remain poorly understood. In this work, we leverage the recently proposed dynamical systems framework to represent softmax, norm and linear attention as dynamical systems, enabling a structured comparison with SSMs by analyzing their respective eigenvalue spectra. Since eigenvalues capture essential aspects of dynamical system behavior, we conduct an extensive empirical analysis across diverse sequence models and benchmarks. We first show that eigenvalues influence essential aspects of memory and long-range dependency modeling, revealing spectral signatures that align with task requirements. Building on these insights, we then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance. This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.
