State-Free Inference of State-Space Models: The Transfer Function Approach
Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli
TL;DR
The paper tackles the memory- and computation-heavy challenges of state-space models for sequence modeling by reframing SSMs through their transfer-function, rational transfer function (RTF), representation. It introduces a state-free parallel inference algorithm that computes the impulse-response spectrum via a single FFT, achieving $O(\ell)$ space and $O(\ell \log \ell)$ time, and demonstrates strong empirical gains on long-range tasks and language modeling. The approach yields state-of-the-art efficiency among attention-free models on Long Range Arena and improves perplexity on WikiText103 when integrated into Hyena-RTF, while addressing stability through initialization and constraint analysis. These results suggest RTF enables scalable, expressive, and efficient linear-time-invariant sequence processing across domains, with broad practical implications for fast autoregressive inference and large-state models.
Abstract
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF.
