Table of Contents
Fetching ...

S7: Selective and Simplified State Space Layers for Sequence Modeling

Taylan Soydan, Nikola Zubić, Nico Messikommer, Siddhartha Mishra, Davide Scaramuzza

TL;DR

This work introduces S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance.

Abstract

A central challenge in sequence modeling is efficiently handling tasks with extended contexts. While recent state-space models (SSMs) have made significant progress in this area, they often lack input-dependent filtering or require substantial increases in model complexity to handle input variability. We address this gap by introducing S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance. We prove that this reparameterization ensures stability in long-sequence modeling by keeping state transitions well-behaved over time. Additionally, it controls the gradient norm, enabling efficient training and preventing issues like exploding or vanishing gradients. S7 significantly outperforms baselines across various sequence modeling tasks, including neuromorphic event-based datasets, Long Range Arena benchmarks, and various physical and biological time series. Overall, S7 offers a more straightforward approach to sequence modeling without relying on complex, domain-specific inductive biases, achieving significant improvements across key benchmarks.

S7: Selective and Simplified State Space Layers for Sequence Modeling

TL;DR

This work introduces S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance.

Abstract

A central challenge in sequence modeling is efficiently handling tasks with extended contexts. While recent state-space models (SSMs) have made significant progress in this area, they often lack input-dependent filtering or require substantial increases in model complexity to handle input variability. We address this gap by introducing S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance. We prove that this reparameterization ensures stability in long-sequence modeling by keeping state transitions well-behaved over time. Additionally, it controls the gradient norm, enabling efficient training and preventing issues like exploding or vanishing gradients. S7 significantly outperforms baselines across various sequence modeling tasks, including neuromorphic event-based datasets, Long Range Arena benchmarks, and various physical and biological time series. Overall, S7 offers a more straightforward approach to sequence modeling without relying on complex, domain-specific inductive biases, achieving significant improvements across key benchmarks.
Paper Structure (60 sections, 2 theorems, 64 equations, 3 figures, 7 tables)

This paper contains 60 sections, 2 theorems, 64 equations, 3 figures, 7 tables.

Key Result

Theorem 3.5

Let $\mathbf{H}$ be any bounded, causal, continuous, and regular linear functional. Suppose $\mathbf{H}$ is approximated by a sequence of state-space models $\{ \widehat{\mathbf{H}}(\cdot; \theta_m) \}_{m=1}^\infty$ with input-dependent dynamics of the form Eq. equation:input_dependent_ssm. Then, th

Figures (3)

  • Figure 1: The S7 Layer Architecture. The diagram illustrates the recurrent structure of the S7 model, which integrates input-dependent state-space models with stable parameterization. The transition matrices $B_k$, $C_k$, $D_k$, and $\bar{\Lambda}_k$ reflect the interaction between input $u_k$ and previous hidden state $x_{k-1}$, while non-linearity is reinforced by the sigmoid. Contrary to input-dependent S6 (Mamba) mamba, this model is much simpler and based on S5 smith2023simplified.
  • Figure 2: Per time-step regression results on the Walker2d kinematic dataset. Our S7 model achieves the lowest MSE.
  • Figure 3: Walker2D kinematic dataset frames visualized.

Theorems & Definitions (6)

  • Theorem 3.5: Existence of Stable Approximation by Stable Reparameterization with Input-Dependent Dynamics
  • proof
  • Theorem 3.6: Parameterizations Influence the Gradient Norm Scale in Input-Dependent SSMs
  • proof : Proof
  • proof
  • proof