From Generalization Analysis to Optimization Designs for State Space Models
Fusheng Liu, Qianxiao Li
TL;DR
The paper tackles the generalization challenge of State Space Models (SSMs) for sequence modeling by deriving a data-dependent bound that ties the memory kernel $\rho_\theta$ to the data's temporal statistics $(\mu, K)$. Leveraging this bound, it introduces a principled initialization scaling that stabilizes initial output scales across different temporal patterns and a regularization term $\lambda\tau(\theta)$ that directly targets the generalization bound. The theoretical bound improves upon prior norm-based analyses by incorporating memory and temporal dependencies, and it is empirically validated on synthetic and real tasks (LRA), showing improved robustness and generalization with minimal overhead. Collectively, the work provides a concrete framework to design and regularize SSMs for varied temporal data, bridging memory structure and data dynamics with practical training strategies.
Abstract
A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results.
