S7: Selective and Simplified State Space Layers for Sequence Modeling

Taylan Soydan; Nikola Zubić; Nico Messikommer; Siddhartha Mishra; Davide Scaramuzza

S7: Selective and Simplified State Space Layers for Sequence Modeling

Taylan Soydan, Nikola Zubić, Nico Messikommer, Siddhartha Mishra, Davide Scaramuzza

TL;DR

This work introduces S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance.

Abstract

A central challenge in sequence modeling is efficiently handling tasks with extended contexts. While recent state-space models (SSMs) have made significant progress in this area, they often lack input-dependent filtering or require substantial increases in model complexity to handle input variability. We address this gap by introducing S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance. We prove that this reparameterization ensures stability in long-sequence modeling by keeping state transitions well-behaved over time. Additionally, it controls the gradient norm, enabling efficient training and preventing issues like exploding or vanishing gradients. S7 significantly outperforms baselines across various sequence modeling tasks, including neuromorphic event-based datasets, Long Range Arena benchmarks, and various physical and biological time series. Overall, S7 offers a more straightforward approach to sequence modeling without relying on complex, domain-specific inductive biases, achieving significant improvements across key benchmarks.

S7: Selective and Simplified State Space Layers for Sequence Modeling

TL;DR

Abstract

Paper Structure (60 sections, 2 theorems, 64 equations, 3 figures, 7 tables)

This paper contains 60 sections, 2 theorems, 64 equations, 3 figures, 7 tables.

Introduction
Related work
Method
Background
State Space Models (SSMs)
Discretization of Continuous SSMs
Input Dependency in State-Space Models
The S7 Layer
Stable Reparameterization for Long-Term Dependencies
Additional Design Choices for Event-Based Neuromorphic Tasks
Efficient Tokenization for Event-Based Data
Efficiency Through Event Pooling and Asynchronous Discretization
Experiments
Experimental Setup
Event (Neuromorphic) Datasets
...and 45 more sections

Key Result

Theorem 3.5

Let $\mathbf{H}$ be any bounded, causal, continuous, and regular linear functional. Suppose $\mathbf{H}$ is approximated by a sequence of state-space models $\{ \widehat{\mathbf{H}}(\cdot; \theta_m) \}_{m=1}^\infty$ with input-dependent dynamics of the form Eq. equation:input_dependent_ssm. Then, th

Figures (3)

Figure 1: The S7 Layer Architecture. The diagram illustrates the recurrent structure of the S7 model, which integrates input-dependent state-space models with stable parameterization. The transition matrices $B_k$, $C_k$, $D_k$, and $\bar{\Lambda}_k$ reflect the interaction between input $u_k$ and previous hidden state $x_{k-1}$, while non-linearity is reinforced by the sigmoid. Contrary to input-dependent S6 (Mamba) mamba, this model is much simpler and based on S5 smith2023simplified.
Figure 2: Per time-step regression results on the Walker2d kinematic dataset. Our S7 model achieves the lowest MSE.
Figure 3: Walker2D kinematic dataset frames visualized.

Theorems & Definitions (6)

Theorem 3.5: Existence of Stable Approximation by Stable Reparameterization with Input-Dependent Dynamics
proof
Theorem 3.6: Parameterizations Influence the Gradient Norm Scale in Input-Dependent SSMs
proof : Proof
proof
proof

S7: Selective and Simplified State Space Layers for Sequence Modeling

TL;DR

Abstract

S7: Selective and Simplified State Space Layers for Sequence Modeling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (6)