Table of Contents
Fetching ...

KOSS: Kalman-Optimal Selective State Spaces for Long-Term Sequence Modeling

Lei Wang, Xin Tan, Mingwei Wang, Ying Zhang

TL;DR

The paper tackles the challenge of long-term sequence modeling by grounding selective information routing in Kalman estimation, yielding a closed-loop, context-aware mechanism (KOSS) that jointly leverages input content and latent-state context. It introduces the Kalman-Optimal SSM, Innovation-Driven Selectivity (IDS), and the Spectral Differentiation Unit (SDU), together with a segment-wise scan strategy to achieve near-linear throughput on long sequences. Empirical results across synthetic tasks and nine real-world forecasting benchmarks show that KOSS consistently surpasses state-of-the-art baselines in accuracy and stability, with strong robustness in irregular, noisy real-world conditions such as SSR radar tracking. The work also provides theoretical and empirical validation of Kalman gain convergence and SDU frequency response, supporting the proposed closed-loop design and its practical scalability across domains.

Abstract

Recent selective state space models (SSMs), such as Mamba and Mamba-2, have demonstrated strong performance in sequence modeling owing to input-dependent selection mechanisms. However, these mechanisms lack theoretical grounding and cannot support context-aware selection from latent state dynamics. To address these limitations, we propose KOSS, a Kalman-optimal Selective State Space model that formulates selection as latent state uncertainty minimization. Derived from estimation theory, KOSS adopts a continuous-time latent update driven by a Kalman gain that dynamically modulates information propagation based on content and context, enabling a closed-loop, context-aware selectivity mechanism. To ensure stable computation and near-linear scalability, KOSS employs global spectral differentiation for frequency-domain derivative estimation, along with a segment-wise scan for hardware-efficient processing. On a selective copying task with distractors, KOSS achieves over 79\% accuracy while baselines drop below 20\%, demonstrating robust context-aware selection. Furthermore, across nine long-term forecasting benchmarks, KOSS reduces MSE by 2.92--36.23\% and consistently outperforms state-of-the-art models in both accuracy and stability. To assess real-world applicability, a case study on secondary surveillance radar (SSR) tracking confirms KOSS's robustness under irregular intervals and noisy conditions and demonstrates its effectiveness in real-world applications. Finally, supplementary experiments verify Kalman gain convergence and the frequency response of spectral differentiation, providing theoretical support for the proposed closed-loop design.

KOSS: Kalman-Optimal Selective State Spaces for Long-Term Sequence Modeling

TL;DR

The paper tackles the challenge of long-term sequence modeling by grounding selective information routing in Kalman estimation, yielding a closed-loop, context-aware mechanism (KOSS) that jointly leverages input content and latent-state context. It introduces the Kalman-Optimal SSM, Innovation-Driven Selectivity (IDS), and the Spectral Differentiation Unit (SDU), together with a segment-wise scan strategy to achieve near-linear throughput on long sequences. Empirical results across synthetic tasks and nine real-world forecasting benchmarks show that KOSS consistently surpasses state-of-the-art baselines in accuracy and stability, with strong robustness in irregular, noisy real-world conditions such as SSR radar tracking. The work also provides theoretical and empirical validation of Kalman gain convergence and SDU frequency response, supporting the proposed closed-loop design and its practical scalability across domains.

Abstract

Recent selective state space models (SSMs), such as Mamba and Mamba-2, have demonstrated strong performance in sequence modeling owing to input-dependent selection mechanisms. However, these mechanisms lack theoretical grounding and cannot support context-aware selection from latent state dynamics. To address these limitations, we propose KOSS, a Kalman-optimal Selective State Space model that formulates selection as latent state uncertainty minimization. Derived from estimation theory, KOSS adopts a continuous-time latent update driven by a Kalman gain that dynamically modulates information propagation based on content and context, enabling a closed-loop, context-aware selectivity mechanism. To ensure stable computation and near-linear scalability, KOSS employs global spectral differentiation for frequency-domain derivative estimation, along with a segment-wise scan for hardware-efficient processing. On a selective copying task with distractors, KOSS achieves over 79\% accuracy while baselines drop below 20\%, demonstrating robust context-aware selection. Furthermore, across nine long-term forecasting benchmarks, KOSS reduces MSE by 2.92--36.23\% and consistently outperforms state-of-the-art models in both accuracy and stability. To assess real-world applicability, a case study on secondary surveillance radar (SSR) tracking confirms KOSS's robustness under irregular intervals and noisy conditions and demonstrates its effectiveness in real-world applications. Finally, supplementary experiments verify Kalman gain convergence and the frequency response of spectral differentiation, providing theoretical support for the proposed closed-loop design.

Paper Structure

This paper contains 80 sections, 42 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: (Left) The Selective Copying task introduces random spacing between inputs and output elements, and can be effectively solved by input-dependent LTV models. (Right) The Context-Aware Selective Copying task extends this setup by adding faint-colored correlated distractors, and requires time-varying models that can precisely and selectively remember or ignore inputs depending on their content and historical context.
  • Figure 2: Time-unrolled KOSS layer: A nonlinear module estimates the Kalman gain $\bm{K}^{(\ell)}$ from $\bm{X}^{(\ell)}$ and $\bm{H}^{(\ell-1)}$, which in turn modulates the dynamic parameters $(\overline{\bm{A}}_K^{(\ell)}, \overline{\bm{B}}_K^{(\ell)})$. The SDU computes input derivatives $\delta \bm{X}$, which are used together with the modulated parameters in a parallel scan module to compute the updated hidden state $\bm{H}^{(\ell)}$.
  • Figure 3: Selective Copying: Performance under distractor interference. KOSS demonstrates strong resilience through innovation-driven selectivity, while Mamba and S4 degrade sharply under contextual noise. Interference is limited to 50% to maintain a learnable signal-to-noise ratio.
  • Figure 4: KOSLM vs. S-Mamba: Forecasting comparison on five representative datasets with both input and prediction horizons set to 720. the blue line denotes the ground truth, and the red line indicates model predictions. KOSS exhibits superior long-horizon stability and trend consistency compared to S-Mamba.
  • Figure 5: (Runtime Benchmarks.) Our efficient scan is up to 20× faster than a standard PyTorch implementation during training.
  • ...and 6 more figures