Table of Contents
Fetching ...

WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention

Ruben Solozabal, Velibor Bojkovic, Hilal Alquabeh, Klea Ziu, Kentaro Inui, Martin Takac

TL;DR

This work introduces WaveSSM, a collection of SSMs constructed over wavelet frames that outperforms orthogonal counterparts as S4 on real-world datasets with transient dynamics, including physiological signals on the PTB-XL dataset and raw audio on Speech Commands.

Abstract

State-space models (SSMs) have emerged as a powerful foundation for long-range sequence modeling, with the HiPPO framework showing that continuous-time projection operators can be used to derive stable, memory-efficient dynamical systems that encode the past history of the input signal. However, existing projection-based SSMs often rely on polynomial bases with global temporal support, whose inductive biases are poorly matched to signals exhibiting localized or transient structure. In this work, we introduce \emph{WaveSSM}, a collection of SSMs constructed over wavelet frames. Our key observation is that wavelet frames yield a localized support on the temporal dimension, useful for tasks requiring precise localization. Empirically, we show that on equal conditions, \textit{WaveSSM} outperforms orthogonal counterparts as S4 on real-world datasets with transient dynamics, including physiological signals on the PTB-XL dataset and raw audio on Speech Commands.

WaveSSM: Multiscale State-Space Models for Non-stationary Signal Attention

TL;DR

This work introduces WaveSSM, a collection of SSMs constructed over wavelet frames that outperforms orthogonal counterparts as S4 on real-world datasets with transient dynamics, including physiological signals on the PTB-XL dataset and raw audio on Speech Commands.

Abstract

State-space models (SSMs) have emerged as a powerful foundation for long-range sequence modeling, with the HiPPO framework showing that continuous-time projection operators can be used to derive stable, memory-efficient dynamical systems that encode the past history of the input signal. However, existing projection-based SSMs often rely on polynomial bases with global temporal support, whose inductive biases are poorly matched to signals exhibiting localized or transient structure. In this work, we introduce \emph{WaveSSM}, a collection of SSMs constructed over wavelet frames. Our key observation is that wavelet frames yield a localized support on the temporal dimension, useful for tasks requiring precise localization. Empirically, we show that on equal conditions, \textit{WaveSSM} outperforms orthogonal counterparts as S4 on real-world datasets with transient dynamics, including physiological signals on the PTB-XL dataset and raw audio on Speech Commands.
Paper Structure (66 sections, 4 theorems, 85 equations, 18 figures, 8 tables)

This paper contains 66 sections, 4 theorems, 85 equations, 18 figures, 8 tables.

Key Result

Lemma 3.1

Let $F \in \mathbb{R}^{N \times L}$ be a discretized frame matrix with full row rank and let $S := F F^{\ast}$ denote the associated frame operator. Let $\dot F$ denote the row-wise time derivatives of the sampled frame atoms, and define $A$ by the above least-squares projection. Then $A$ is uniquel

Figures (18)

  • Figure 1: Comparison between global polynomial bases and localized wavelet frames. Left: Legendre orthogonal-basis functions are utilized in the HiPPO framework to construct an online projection of the input signal with global support. Right: wavelet-based frames (Morlet in the example) exhibit more localized time support, promoting selective representations with the ability to attend transient events more effectively.
  • Figure 2: Left: Overview of the WaveSSM framework. The input signal is projected onto a wavelet frame, inducing a continuous-time state-space representation via differentiation of the projection. The resulting SSM maintains a compact latent state with addressable, time-local components, which can be decoded independently or processed for sequence modeling. Right: Visualization of the Jacobian $\partial h_T / \partial x_t$, illustrating the influence of inputs at different time steps on the final hidden state. Results are shown for Morlet wavelets of order $N=100$ under the scaled measure $\mu_{sc}$; see Appendix \ref{['jacobian']} for details.
  • Figure 3: Visualization of the transition matrices $A$ obtained for the wavelet families for $N=40$. From left to right: Morlet, Gaussian-derivative (Gauss), Mexican hat (Mexhat), Discrete Prolate Spheroidal Slepians (DPSS), and Daubechies (db6).
  • Figure 4: Condition number of the discretized frame operator $\kappa(S)$ as a function of the state dimension $N$. Raw frames (dashed) can become ill-conditioned as $N$ grows, while tightening$F$ (in solid lines) mitigates this degradation. The improvement is especially pronounced for Morlet, Gaussian, Daubechies, and Mexican-hat frames.
  • Figure 5: Approximation errors on various frames for the two-step piecewise constant function, see Appendix \ref{['app:step-budget']} for details.
  • ...and 13 more figures

Theorems & Definitions (8)

  • Lemma 3.1
  • Theorem 3.2: Informal
  • Remark 6.1
  • Lemma 1.1: Truncation bound for Parseval frames
  • proof
  • Theorem 2.1: Parseval wavelet frames approximations
  • proof
  • proof