State-Free Inference of State-Space Models: The Transfer Function Approach

Rom N. Parnichkun; Stefano Massaroli; Alessandro Moro; Jimmy T. H. Smith; Ramin Hasani; Mathias Lechner; Qi An; Christopher Ré; Hajime Asama; Stefano Ermon; Taiji Suzuki; Atsushi Yamashita; Michael Poli

State-Free Inference of State-Space Models: The Transfer Function Approach

Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli

TL;DR

The paper tackles the memory- and computation-heavy challenges of state-space models for sequence modeling by reframing SSMs through their transfer-function, rational transfer function (RTF), representation. It introduces a state-free parallel inference algorithm that computes the impulse-response spectrum via a single FFT, achieving $O(\ell)$ space and $O(\ell \log \ell)$ time, and demonstrates strong empirical gains on long-range tasks and language modeling. The approach yields state-of-the-art efficiency among attention-free models on Long Range Arena and improves perplexity on WikiText103 when integrated into Hyena-RTF, while addressing stability through initialization and constraint analysis. These results suggest RTF enables scalable, expressive, and efficient linear-time-invariant sequence processing across domains, with broad practical implications for fast autoregressive inference and large-state models.

Abstract

We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization, which enables direct computation of its corresponding convolutional kernel's spectrum via a single Fast Fourier Transform. Our experimental results across multiple sequence lengths and state sizes illustrates, on average, a 35% training speed improvement over S4 layers -- parametrized in time-domain -- on the Long Range Arena benchmark, while delivering state-of-the-art downstream performances over other attention-free approaches. Moreover, we report improved perplexity in language modeling over a long convolutional Hyena baseline, by simply introducing our transfer function parametrization. Our code is available at https://github.com/ruke1ire/RTF.

State-Free Inference of State-Space Models: The Transfer Function Approach

TL;DR

space and

time, and demonstrates strong empirical gains on long-range tasks and language modeling. The approach yields state-of-the-art efficiency among attention-free models on Long Range Arena and improves perplexity on WikiText103 when integrated into Hyena-RTF, while addressing stability through initialization and constraint analysis. These results suggest RTF enables scalable, expressive, and efficient linear-time-invariant sequence processing across domains, with broad practical implications for fast autoregressive inference and large-state models.

Abstract

Paper Structure (49 sections, 6 theorems, 60 equations, 5 figures, 12 tables, 1 algorithm)

This paper contains 49 sections, 6 theorems, 60 equations, 5 figures, 12 tables, 1 algorithm.

Introduction
Preliminaries and Related Work
Sequence Modeling with Convolutions
State-Space Realization of Convolutions
State-space representations
Training SSMs in the frequency domain
Transfer Function Representation
Coordinate invariance of the transfer function
Expressivity of the transfer function
State-Free Parallel Inference
Fast Companion Recurrence
Stable Parametrization
Experimental Results
Efficiency Profiling
Modeling Long Range Dependencies
...and 34 more sections

Key Result

Lemma 3.1

Coefficients $a,b$ are invariant under any invertible change of variables.

Figures (5)

Figure 1: An illustration depicting the scaling of memory consumption on a scan-based algorithm (S5) and the proposed state-free inference algorithm denoted as RTF. We note that with larger state sizes, inference with S5 becomes prohibitively memory-intensive.
Figure 2: (a) The rational transfer function (RTF) representation comprises numerator and denominator polynomial coefficients $\textbf{b}$ and $\textbf{a}$, and the feedforward term $h_0$. (b) illustrates the proposed state-free parallel inference algorithm. The key to efficient state-free inference lies in casting $\textbf{b}$ and $\textbf{a}$ onto the sequence length for computing the convolutional filter $(h_i)_{i \in [\ell]}$. (c) illustrates the recurrent form of RTF which can be used for fast single-step inference. Here we denote the $i$-th state at time $t$ as $x_t^{i}$.
Figure 3: Latency profiles for a single RTF, S4D, and S4 layer at various state sizes. It is evident that RTF consistently exhibits superior parallel inference speeds, with its lower latency across a range of tasks and state sizes.
Figure 4: The space of stable roots of a 2nd order polynomial with conjugate roots is illustrated with a green-blue colormap. The figure on the right overlays the space of coefficients that obey Montel's constraints in pink.
Figure 5: This figure illustrates the scaling of parallel inference latency on S5 and RTF across various sequence lengths and state sizes. When comparing equal expansion factors, it becomes evident that RTF provides lower latencies across different sequence lengths.

Theorems & Definitions (9)

Lemma 3.1
Lemma 3.2
proof
Lemma 3.3
proof
Lemma 3.4
proof
Lemma 1.1: sandberg1963theory
Lemma 2.1

State-Free Inference of State-Space Models: The Transfer Function Approach

TL;DR

Abstract

State-Free Inference of State-Space Models: The Transfer Function Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (9)