Bridging Expressivity and Scalability with Adaptive Unitary SSMs
Arjun Karuvally, Franz Nowak, Anderson T. Keller, Carmen Amo Alonso, Terrence J. Sejnowski, Hava T. Siegelmann
TL;DR
This work addresses the expressivity–scalability gap in long-sequence modeling by introducing the Adaptive Unitary State Space Model (AUSSM), which uses input-dependent skew-symmetric recurrence to yield unitary dynamics and rich temporal representations. Theoretical results show AUSSM can perform modulo counting and, when combined with Mamba, achieve maximal expressivity within diagonal SSMs, effectively realizing solvable regular languages. To scale this expressive power, the authors develop a separable convolution formulation and a CUDA kernel, reducing adaptive recurrence from quadratic to linear time/space and enabling practical training on long sequences. Empirically, AUSSM and the hybrid AUSSM+Mamba model outperform prior SSMs on algorithmic tasks and deliver strong performance on real-world long-time-series benchmarks, including state-of-the-art results on Weather forecasting. The work also draws connections between adaptive unitary dynamics and conserved neural trajectories, suggesting a robust inductive bias for both symbolic and continuous sequence modeling with potential broad impact on scalable temporal reasoning.
Abstract
Recent work has revealed that state space models (SSMs), while efficient for long-sequence processing, are fundamentally limited in their ability to represent formal languages-particularly due to time-invariant and real-valued recurrence structures. In this work, we draw inspiration from adaptive and structured dynamics observed in biological neural systems and introduce the Adaptive Unitary State Space Model (AUSSM), a novel class of SSMs that leverages skew-symmetric, input-dependent recurrence to achieve unitary evolution and high expressive power. Using algebraic automata theory, we prove that AUSSM can perform modulo counting and simulate solvable group automata at finite precision, enabling AUSSM to model a broad class of regular languages out of reach for other SSM architectures. To overcome the practical inefficiencies of adaptive recurrence, we develop a separable convolution formulation and a CUDA implementation that enables scalable parallel training. Empirically, we show that AUSSM and its hybrid variant-interleaved with Mamba-outperform prior SSMs on formal algorithmic tasks such as parity and modular arithmetic, and achieve competent performance on real-world long time-series classification benchmarks. Our results demonstrate that adaptive unitary recurrence provides a powerful and efficient inductive bias for both symbolic and continuous sequence modeling. The code is available at https://github.com/arjunkaruvally/AUSSM
