Table of Contents
Fetching ...

SMR: State Memory Replay for Long Sequence Modeling

Biqing Qi, Junqi Gao, Kaiyan Zhang, Dong Li, Jianxing Liu, Ligang Wu, Bowen Zhou

TL;DR

This paper tackles the NSS instability in long-sequence SSMs caused by irregular sampling. It introduces State Memory Replay (SMR), a plug-in with learnable memories and convolutional gating, grounded in Event-Triggered Control theory to enable stable SSA across varying sampling points. Theoretical analysis and extensive experiments show SMR improves generalization and performance for S4, S5, S6, SPADE, and Mega across WikiText-103 and the LRA benchmark, without sacrificing training speed. The work suggests a practical path to flexible, efficient long-sequence modeling with SSMs in real-world settings where sampling grids vary.

Abstract

Despite the promising performance of state space models (SSMs) in long sequence modeling, limitations still exist. Advanced SSMs like S5 and S6 (Mamba) in addressing non-uniform sampling, their recursive structures impede efficient SSM computation via convolution. To overcome compatibility limitations in parallel convolutional computation, this paper proposes a novel non-recursive non-uniform sample processing strategy. Theoretical analysis of SSMs through the lens of Event-Triggered Control (ETC) theory reveals the Non-Stable State (NSS) problem, where deviations from sampling point requirements lead to error transmission and accumulation, causing the divergence of the SSM's hidden state. Our analysis further reveals that adjustments of input sequences with early memories can mitigate the NSS problem, achieving Sampling Step Adaptation (SSA). Building on this insight, we introduce a simple yet effective plug-and-play mechanism, State Memory Replay (SMR), which utilizes learnable memories to adjust the current state with multi-step information for generalization at sampling points different from those in the training data. This enables SSMs to stably model varying sampling points. Experiments on long-range modeling tasks in autoregressive language modeling and Long Range Arena demonstrate the general effectiveness of the SMR mechanism for a series of SSM models.

SMR: State Memory Replay for Long Sequence Modeling

TL;DR

This paper tackles the NSS instability in long-sequence SSMs caused by irregular sampling. It introduces State Memory Replay (SMR), a plug-in with learnable memories and convolutional gating, grounded in Event-Triggered Control theory to enable stable SSA across varying sampling points. Theoretical analysis and extensive experiments show SMR improves generalization and performance for S4, S5, S6, SPADE, and Mega across WikiText-103 and the LRA benchmark, without sacrificing training speed. The work suggests a practical path to flexible, efficient long-sequence modeling with SSMs in real-world settings where sampling grids vary.

Abstract

Despite the promising performance of state space models (SSMs) in long sequence modeling, limitations still exist. Advanced SSMs like S5 and S6 (Mamba) in addressing non-uniform sampling, their recursive structures impede efficient SSM computation via convolution. To overcome compatibility limitations in parallel convolutional computation, this paper proposes a novel non-recursive non-uniform sample processing strategy. Theoretical analysis of SSMs through the lens of Event-Triggered Control (ETC) theory reveals the Non-Stable State (NSS) problem, where deviations from sampling point requirements lead to error transmission and accumulation, causing the divergence of the SSM's hidden state. Our analysis further reveals that adjustments of input sequences with early memories can mitigate the NSS problem, achieving Sampling Step Adaptation (SSA). Building on this insight, we introduce a simple yet effective plug-and-play mechanism, State Memory Replay (SMR), which utilizes learnable memories to adjust the current state with multi-step information for generalization at sampling points different from those in the training data. This enables SSMs to stably model varying sampling points. Experiments on long-range modeling tasks in autoregressive language modeling and Long Range Arena demonstrate the general effectiveness of the SMR mechanism for a series of SSM models.
Paper Structure (26 sections, 2 theorems, 27 equations, 7 figures, 6 tables)

This paper contains 26 sections, 2 theorems, 27 equations, 7 figures, 6 tables.

Key Result

Proposition 1

Given bounded inputs satisfying $\|{u}\|\le \zeta$, $\|\boldsymbol C\|\le c$ and $\|\boldsymbol B\|<b$, and defining the observation error caused by sampling points as $\boldsymbol{\varepsilon}_i = {u}^\prime_i-{u}_i$, it can be concluded that when $\lim_{t\rightarrow\infty}\|\boldsymbol{x}_t\|>\fra

Figures (7)

  • Figure 1: An example of the issue of NSS in SSM.
  • Figure 2: An illustrative instance of the NSS issue in S4 is presented here.
  • Figure 3: Illustration of the proposed SMR Mechanism.
  • Figure 4: Comparative results of S4 incorporated with SMR (S4+SMR) on the aforementioned examples. The pair of figures displays the prediction outcomes of S4+SMR for the perturbed input $u'$ (left) and the latent states when provided with inputs $u$ and $u'$ (right).
  • Figure 5: Schematic diagram of various SSMs after incorporating SMR.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Theorem 1