Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints
Dániel Rácz, Mihály Petreczky, Bálint Daróczy
TL;DR
This work addresses the challenge of generalization in deep State-Space Model (SSM) architectures operating on long sequences by deriving a sequence-length independent PAC bound. Central to the approach is the Rademacher Contraction (RC) framework, which bounds the Rademacher complexity of deep SSMs by their stability-driven norms (notably $H_2$ and $\ell_1$ norms) and a controlled composition of RC blocks. The main result shows that, under mild assumptions and stability constraints, the generalization gap scales as $O(1/\sqrt{N})$ with a bound that does not depend on the sequence length $T$, though it may grow with depth unless contraction holds. This provides theoretical justification for using stability-enforced SSM blocks (as in S4/S5/LRU) and offers a principled, architecture-agnostic perspective on why deep SSMs generalize well on long-range data. The framework yields a practical interpretation of stability as a mechanism that controls generalization, and it sets the stage for tighter bounds and extensions to broader dynamical architectures.
Abstract
Many state-of-the-art models trained on long-range sequences, for example S4, S5 or LRU, are made of sequential blocks combining State-Space Models (SSMs) with neural networks. In this paper we provide a PAC bound that holds for these kind of architectures with \emph{stable} SSM blocks and does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.
