StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Shida Wang, Qianxiao Li
TL;DR
StableSSM addresses long-term memory in sequence modeling by proving that state-space models without reparameterization inherit a memory curse similar to RNNs, restricting stable approximation to targets with exponential memory. It introduces a class of stable reparameterizations that lifts memory limitations and yields improved optimization stability, including a principled 'best' parameterization that balances gradient scales. The approach is validated on synthetic tasks, language modeling with WikiText-103, image classification, and Long Range Arena benchmarks, offering a theoretical and practical framework for designing memory-capable, stable SSMs. Overall, stable reparameterization not only enables stable learning of decaying-memory targets but also enhances training stability for large-scale sequence models.
Abstract
In this paper, we investigate the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. We prove that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs: the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. Our analysis identifies this "curse of memory" as a result of the recurrent weights converging to a stability boundary, suggesting that a reparameterization technique can be effective. To this end, we introduce a class of reparameterization techniques for SSMs that effectively lift its memory limitations. Besides improving approximation capabilities, we further illustrate that a principled choice of reparameterization scheme can also enhance optimization stability. We validate our findings using synthetic datasets, language models and image classifications.
