SpectraLDS: Provable Distillation for Linear Dynamical Systems
Devan Shah, Shlomo Fortgang, Sofiia Druchyna, Elad Hazan
TL;DR
SpectraLDS presents a provable distillation from spectral-transform-based STU filters to an explicit symmetric LDS, enabling constant-time per-token inference while preserving long-range predictive power. The method builds a bridge between convex spectral learning and recurrent LDS representations, providing a transformation that converts STU filters into LDS parameters with provable, exponentially decaying approximation error in the number of filters. The approach achieves strong practical impact by enabling scalable, accurate language modeling with per-token costs that do not grow with sequence length, and experiments show near-identical performance to STU baselines with substantial speedups. This work unifies spectral filtering advantages with recurrent inference, offering a principled pathway to efficient, memory-rich sequence modeling in real-world applications.
Abstract
We present the first provable method for identifying symmetric linear dynamical systems (LDS) with accuracy guarantees that are independent of the systems' state dimension or effective memory. Our approach builds upon recent work that represents symmetric LDSs as convolutions learnable via fixed spectral transformations. We show how to invert this representation, thereby recovering an LDS model from its spectral transform and yielding an end-to-end convex optimization procedure. This distillation preserves predictive accuracy while enabling constant-time and constant-space inference per token, independent of sequence length. We evaluate our method, SpectraLDS, as a component in sequence prediction architectures and demonstrate that accuracy is preserved while inference efficiency is improved on tasks such as language modeling.
