KOSS: Kalman-Optimal Selective State Spaces for Long-Term Sequence Modeling

Lei Wang; Xin Tan; Mingwei Wang; Ying Zhang

KOSS: Kalman-Optimal Selective State Spaces for Long-Term Sequence Modeling

Lei Wang, Xin Tan, Mingwei Wang, Ying Zhang

TL;DR

The paper tackles the challenge of long-term sequence modeling by grounding selective information routing in Kalman estimation, yielding a closed-loop, context-aware mechanism (KOSS) that jointly leverages input content and latent-state context. It introduces the Kalman-Optimal SSM, Innovation-Driven Selectivity (IDS), and the Spectral Differentiation Unit (SDU), together with a segment-wise scan strategy to achieve near-linear throughput on long sequences. Empirical results across synthetic tasks and nine real-world forecasting benchmarks show that KOSS consistently surpasses state-of-the-art baselines in accuracy and stability, with strong robustness in irregular, noisy real-world conditions such as SSR radar tracking. The work also provides theoretical and empirical validation of Kalman gain convergence and SDU frequency response, supporting the proposed closed-loop design and its practical scalability across domains.

Abstract

Recent selective state space models (SSMs), such as Mamba and Mamba-2, have demonstrated strong performance in sequence modeling owing to input-dependent selection mechanisms. However, these mechanisms lack theoretical grounding and cannot support context-aware selection from latent state dynamics. To address these limitations, we propose KOSS, a Kalman-optimal Selective State Space model that formulates selection as latent state uncertainty minimization. Derived from estimation theory, KOSS adopts a continuous-time latent update driven by a Kalman gain that dynamically modulates information propagation based on content and context, enabling a closed-loop, context-aware selectivity mechanism. To ensure stable computation and near-linear scalability, KOSS employs global spectral differentiation for frequency-domain derivative estimation, along with a segment-wise scan for hardware-efficient processing. On a selective copying task with distractors, KOSS achieves over 79\% accuracy while baselines drop below 20\%, demonstrating robust context-aware selection. Furthermore, across nine long-term forecasting benchmarks, KOSS reduces MSE by 2.92--36.23\% and consistently outperforms state-of-the-art models in both accuracy and stability. To assess real-world applicability, a case study on secondary surveillance radar (SSR) tracking confirms KOSS's robustness under irregular intervals and noisy conditions and demonstrates its effectiveness in real-world applications. Finally, supplementary experiments verify Kalman gain convergence and the frequency response of spectral differentiation, providing theoretical support for the proposed closed-loop design.

KOSS: Kalman-Optimal Selective State Spaces for Long-Term Sequence Modeling

TL;DR

Abstract

KOSS: Kalman-Optimal Selective State Spaces for Long-Term Sequence Modeling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)