WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention

Ruben Solozabal; Velibor Bojkovic; Hilal Alquabeh; Klea Ziu; Kentaro Inui; Martin Takac

WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention

Ruben Solozabal, Velibor Bojkovic, Hilal Alquabeh, Klea Ziu, Kentaro Inui, Martin Takac

TL;DR

This work introduces WaveSSM, a collection of SSMs constructed over wavelet frames that outperforms orthogonal counterparts as S4 on real-world datasets with transient dynamics, including physiological signals on the PTB-XL dataset and raw audio on Speech Commands.

Abstract

State-space models (SSMs) have emerged as a powerful foundation for long-range sequence modeling, with the HiPPO framework showing that continuous-time projection operators can be used to derive stable, memory-efficient dynamical systems that encode the past history of the input signal. However, existing projection-based SSMs often rely on polynomial bases with global temporal support, whose inductive biases are poorly matched to signals exhibiting localized or transient structure. In this work, we introduce \emph{WaveSSM}, a collection of SSMs constructed over wavelet frames. Our key observation is that wavelet frames yield a localized support on the temporal dimension, useful for tasks requiring precise localization. Empirically, we show that on equal conditions, \textit{WaveSSM} outperforms orthogonal counterparts as S4 on real-world datasets with transient dynamics, including physiological signals on the PTB-XL dataset and raw audio on Speech Commands.

WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention

TL;DR

Abstract

Paper Structure (66 sections, 4 theorems, 85 equations, 18 figures, 8 tables)

This paper contains 66 sections, 4 theorems, 85 equations, 18 figures, 8 tables.

Introduction
Background
State-Space Models for Sequence Representation
Scaled measure.
Translated measure.
Methodology: State Space Models over Wavelet Frames
Wavelet frame construction
Stability considerations
Why wavelet frames outperform polynomials on localized irregularities
Why Linear Time-Invariant HiPPO Kernels Cannot Preserve Disjoint Temporal Windows
HiPPO as a Single Convolutional Memory Trace
Superposition Prevents Temporal Attention
Storing the Whole History Is Not Storing It Usefully
Incorporating Wavelet Frames into the S4 Architecture
Experimentation
...and 51 more sections

Key Result

Lemma 3.1

Let $F \in \mathbb{R}^{N \times L}$ be a discretized frame matrix with full row rank and let $S := F F^{\ast}$ denote the associated frame operator. Let $\dot F$ denote the row-wise time derivatives of the sampled frame atoms, and define $A$ by the above least-squares projection. Then $A$ is uniquel

Figures (18)

Figure 1: Comparison between global polynomial bases and localized wavelet frames. Left: Legendre orthogonal-basis functions are utilized in the HiPPO framework to construct an online projection of the input signal with global support. Right: wavelet-based frames (Morlet in the example) exhibit more localized time support, promoting selective representations with the ability to attend transient events more effectively.
Figure 2: Left: Overview of the WaveSSM framework. The input signal is projected onto a wavelet frame, inducing a continuous-time state-space representation via differentiation of the projection. The resulting SSM maintains a compact latent state with addressable, time-local components, which can be decoded independently or processed for sequence modeling. Right: Visualization of the Jacobian $\partial h_T / \partial x_t$, illustrating the influence of inputs at different time steps on the final hidden state. Results are shown for Morlet wavelets of order $N=100$ under the scaled measure $\mu_{sc}$; see Appendix \ref{['jacobian']} for details.
Figure 3: Visualization of the transition matrices $A$ obtained for the wavelet families for $N=40$. From left to right: Morlet, Gaussian-derivative (Gauss), Mexican hat (Mexhat), Discrete Prolate Spheroidal Slepians (DPSS), and Daubechies (db6).
Figure 4: Condition number of the discretized frame operator $\kappa(S)$ as a function of the state dimension $N$. Raw frames (dashed) can become ill-conditioned as $N$ grows, while tightening$F$ (in solid lines) mitigates this degradation. The improvement is especially pronounced for Morlet, Gaussian, Daubechies, and Mexican-hat frames.
Figure 5: Approximation errors on various frames for the two-step piecewise constant function, see Appendix \ref{['app:step-budget']} for details.
...and 13 more figures

Theorems & Definitions (8)

Lemma 3.1
Theorem 3.2: Informal
Remark 6.1
Lemma 1.1: Truncation bound for Parseval frames
proof
Theorem 2.1: Parseval wavelet frames approximations
proof
proof

WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention

TL;DR

Abstract

WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (8)